Commits

SHIBUKAWA Yoshiki committed a184d22

Add development memo about stemming JS code, acceleration tips about stemming, small bug fix

  • Participants
  • Parent commits 2f8c7de
  • Branches add_stemmer

Comments (0)

Files changed (3)

File doc/config.rst

    * ``sv`` -- Swedish
    * ``tr`` -- Turkish
 
+   .. admonition:: Accelerate build speed
+
+      Each language (except Japanese) provides its own stemming algorithm.
+      Sphinx uses Python implementation by default. You can use
+      C implementation to accelerate building the index file.
+
+      * `PorterStemmer <https://pypi.python.org/pypi/PorterStemmer>`_ (`en`)
+      * `PyStemmer <https://pypi.python.org/pypi/PyStemmer>`_ (all languages)
+
    .. versionadded:: 1.1
 
    .. versionchanged:: 1.3

File doc/devguide.rst

 
 * Set the debugging options in the `Docutils configuration file
   <http://docutils.sourceforge.net/docs/user/config.html>`_.
+
+* JavaScript stemming algorithms in `sphinx/search/*.py` (except `en.py`) are
+  genereted by
+  `modified snowballcode generator <https://github.com/shibukawa/snowball>`_.
+  Generated `JSX <http://jsx.github.io/>`_ files are
+  in `this repository <https://github.com/shibukawa/snowball-stemmer.jsx>`_.
+  You can get resulting JavaScript files by the following command:
+
+  .. code-block:: bash
+
+     $ npm install
+     $ node_modules/.bin/grunt build # -> dest/*.global.js

File sphinx/search/__init__.py

         Return true if the target word should be registered in the search index.
         This method is called after stemming.
         """
-        return not (((len(word) < 3) and (12353 < ord(word[0]) < 12436)) or
+        return len(word) == 0 or not (((len(word) < 3) and (12353 < ord(word[0]) < 12436)) or
             (ord(word[0]) < 256 and (len(word) < 3 or word in self.stopwords or
                                      word.isdigit())))