1. Vinay Sajip
  2. whoosh

Commits

Matt Chaput  committed 2e28e0b Merge

Merge with thomaswaldmann's documentation fork. Thank Thomas! See issue #137.

  • Participants
  • Parent commits 6a0a739, bc35544
  • Branches default

Comments (0)

Files changed (11)

File README.txt

View file
  • Ignore whitespace
 If you have ``setuptools`` or ``pip`` installed, you can use ``easy_install``
 or ``pip`` to download and install Whoosh automatically::
 
-	$ easy_install Whoosh
-	
-	or
-	
-	$ pip install Whoosh
+    $ easy_install Whoosh
+
+    or
+
+    $ pip install Whoosh
 
 Learning more
 =============
 
 You can check out the latest version of the source code using Mercurial::
 
-	hg clone http://bitbucket.org/mchaput/whoosh
+    hg clone http://bitbucket.org/mchaput/whoosh
+

File docs/source/analysis.rst

View file
  • Ignore whitespace
 	my_analyzer = RegexTokenizer() | LowercaseFilter() | StopFilter()
 	
 The first item must be a tokenizer and the rest must be filters (you can't put a
-filter first or a tokenizer after the first item). Note that is only works if at
+filter first or a tokenizer after the first item). Note that this only works if at
 least the tokenizer is a subclass of ``whoosh.analysis.Composable``, as all the
 tokenizers and filters that ship with Whoosh are.
 
 
 The mixing of persistent "setting" and transient "information" attributes on the
 Token object is not especially elegant. If I ever have a better idea I might
-change it ;) Nothing requires that an Analyzer be implemented by calling a
+change it. ;) Nothing requires that an Analyzer be implemented by calling a
 tokenizer and filters. Tokenizers and filters are simply a convenient way to
 structure the code. You're free to write an analyzer any way you want, as long
 as it implements ``__call__``.

File docs/source/facets.rst

View file
  • Ignore whitespace
 
 .. tip::
     Whoosh currently only supports **non-overlapping** categories. A document
-    cannot belong to facets at the same time. (It is not an error if the facets
-    overlap; each document will simply be sorted into one category arbitrarily.)
+    cannot belong to multiple facets at the same time. (It is not an error if
+    the facets overlap; each document will simply be sorted into one category
+    arbitrarily.)
 
 Faceting relies on field caches. See :doc:`fieldcaches` for information about
 field caches.

File docs/source/intro.rst

View file
  • Ignore whitespace
 Getting help with Whoosh
 ------------------------
 
-You can view outstanding issues and file bugs on the `Whoosh Trac <http://trac.whoosh.ca>`_.
-You can ask for help on the `Whoosh mailing list <http://groups.google.com/group/whoosh>`_.
-
-
-
+You can view outstanding issues on the
+`Whoosh Bitbucket page <http://bitbucket.org/mchaput/whoosh>`_
+and get help on the `Whoosh mailing list <http://groups.google.com/group/whoosh>`_.

File docs/source/parsing.rst

View file
  • Ignore whitespace
     object) that accepts a fieldname string and term text unicode string in its
     ``__init__`` method. The default is :class:`whoosh.query.Term`.
 
-    This is useful if you want to chnage the default term class to
+    This is useful if you want to change the default term class to
     :class:`whoosh.query.Variations`, or if you've written a custom term class
     you want the parser to use instead of the ones shipped with Whoosh.
 

File docs/source/releases/1_0.rst

View file
  • Ignore whitespace
 Whoosh 1.x release notes
 ========================
 
+Whoosh 1.8.2
+============
+
+Whoosh 1.8.2 fixes some bugs, including a mistyped signature in
+Searcher.more_like and a bad bug in Collector that could screw up the
+ordering of results given certain parameters. 
+
+
+Whoosh 1.8.1
+============
+
+Whoosh 1.8.1 includes a few recent bugfixes/improvements:
+
+- ListMatcher.skip_to_quality() wasn't returning an integer, resulting
+  in a "None + int" error.
+
+- Fixed locking and memcache sync bugs in the Google App Engine storage
+  object.
+
+- MultifieldPlugin wasn't working correctly with groups.
+
+  - The binary matcher trees of Or and And are now generated using a
+    Huffman-like algorithm instead perfectly balanced. This gives a
+    noticeable speed improvement because less information has to be passed
+    up/down the tree.
+
+
 Whoosh 1.8
 ==========
 

File docs/source/spelling.rst

View file
  • Ignore whitespace
     ix = index.open_dir("index")
 
     # Start/open a spelling dictionary in the same directory
-    speller = SpellChecer(ix.storage)
+    speller = SpellChecker(ix.storage)
 
 Whoosh lets you keep multiple indexes in the same directory by assigning the
 indexes different names. The default name for a regular index is ``_MAIN``. The

File setup.py

View file
  • Ignore whitespace
 from whoosh import __version__, versionstring
 
 setup(
-	name = "Whoosh",
-	version = versionstring(),
-	package_dir = {'': 'src'},
-	packages = ["whoosh", "whoosh.filedb", "whoosh.lang", "whoosh.qparser", "whoosh.support"],
-	
-	author = "Matt Chaput",
-	author_email = "matt@whoosh.ca",
-	
-	description = "Fast, pure-Python full text indexing, search, and spell checking library.",
+    name = "Whoosh",
+    version = versionstring(),
+    package_dir = {'': 'src'},
+    packages = ["whoosh", "whoosh.filedb", "whoosh.lang", "whoosh.qparser", "whoosh.support"],
+
+    author = "Matt Chaput",
+    author_email = "matt@whoosh.ca",
+
+    description = "Fast, pure-Python full text indexing, search, and spell checking library.",
     long_description = open("README.txt").read(),
 
-	license = "Two-clause BSD license",
-	keywords = "index search text spell",
-	url = "http://bitbucket.org/mchaput/whoosh",
-	
-	zip_safe = True,
-	test_suite = "nose.collector",
-	
-	classifiers = [
-	"Development Status :: 5 - Production/Stable",
-	"Intended Audience :: Developers",
-	"License :: OSI Approved :: BSD License",
-	"Natural Language :: English",
-	"Operating System :: OS Independent",
-	"Programming Language :: Python :: 2.5",
-	"Topic :: Software Development :: Libraries :: Python Modules",
-	"Topic :: Text Processing :: Indexing",
-	],
+    license = "Two-clause BSD license",
+    keywords = "index search text spell",
+    url = "http://bitbucket.org/mchaput/whoosh",
+
+    zip_safe = True,
+    test_suite = "nose.collector",
+
+    classifiers = [
+    "Development Status :: 5 - Production/Stable",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: BSD License",
+    "Natural Language :: English",
+    "Operating System :: OS Independent",
+    "Programming Language :: Python :: 2.5",
+    "Topic :: Software Development :: Libraries :: Python Modules",
+    "Topic :: Text Processing :: Indexing",
+    ],
 )
+

File src/whoosh/analysis.py

View file
  • Ignore whitespace
     >>> rext = RegexTokenizer()
     >>> stream = rext(u"this is a test")
     >>> stopper = StopFilter()
-    >>> [token.text for token in sopper(stream)]
+    >>> [token.text for token in stopper(stream)]
     [u"this", u"test"]
     
     """
         """
         :param stoplist: A collection of words to remove from the stream.
             This is converted to a frozenset. The default is a list of
-            common stop words.
+            common English stop words.
         :param minsize: The minimum length of token texts. Tokens with
             text smaller than this will be stopped.
         :param maxsize: The maximum length of token texts. Tokens with text
 
 
 class PyStemmerFilter(StemFilter):
-    """This is a simple sublcass of StemFilter that works with the py-stemmer
+    """This is a simple subclass of StemFilter that works with the py-stemmer
     third-party library. You must have the py-stemmer library installed to use
     this filter.
     
                 
 
 class SubstitutionFilter(Filter):
-    """Performas a regular expression substitution on the token text.
+    """Performs a regular expression substitution on the token text.
     
     This is especially useful for removing text from tokens, for example
     hyphens::
 
 
 def KeywordAnalyzer(lowercase=False, commas=False):
-    """Parses space-separated tokens.
+    """Parses whitespace- or comma-separated tokens.
     
     >>> ana = KeywordAnalyzer()
     >>> [token.text for token in ana(u"Hello there, this is a TEST")]
     [u"Hello", u"there,", u"this", u"is", u"a", u"TEST"]
     
     :param lowercase: whether to lowercase the tokens.
-    :param commas: if True, items are separated by commas rather than spaces.
+    :param commas: if True, items are separated by commas rather than whitespace.
     """
     
     if commas:

File src/whoosh/fields.py

View file
  • Ignore whitespace
 
 class IDLIST(FieldType):
     """Configured field type for fields containing IDs separated by whitespace
-    and/or puntuation.
+    and/or punctuation (or anything else, using the expression param).
     """
     
     __inittypes__ = dict(stored=bool, unique=bool, expression=bool, field_boost=float)

File src/whoosh/searching.py

  • Ignore whitespace
File contents unchanged.