Commits

Anonymous committed 08cb65c

Fix whitespace related issues in the docs.

Some documents contained tabs, mixed tabs with spaces (sometimes leading to
wrongly indented code examples). I replaced all tabs by 4 spaces.

I also removed trailing blanks at end of lines and multiple trailing empty
lines at end of files.

Added blank lines below .. note:: as required by rst syntax (sphinx was
emitting warnings for this).

NO other changes.

Comments (0)

Files changed (37)

docs/source/analysis.rst

         """Uses lower() to lowercase token text. For example, tokens
         "This","is","a","TEST" become "this","is","a","test".
         """
-    
+
         for t in tokens:
             t.text = t.text.lower()
             yield t
 You can implement an analyzer as a custom class or function, or compose
 tokenizers and filters together using the ``|`` character::
 
-	my_analyzer = RegexTokenizer() | LowercaseFilter() | StopFilter()
-	
+    my_analyzer = RegexTokenizer() | LowercaseFilter() | StopFilter()
+
 The first item must be a tokenizer and the rest must be filters (you can't put a
 filter first or a tokenizer after the first item). Note that this only works if at
 least the tokenizer is a subclass of ``whoosh.analysis.Composable``, as all the
 When you create a field in a schema, you can specify your analyzer as a keyword
 argument to the field object::
 
-	schema = Schema(content=TEXT(analyzer=StemmingAnalyzer()))
+    schema = Schema(content=TEXT(analyzer=StemmingAnalyzer()))
 
 
 Advanced Analysis
                         query parsing
 bool   positions        Whether term positions are recorded in the token    False
 bool   chars            Whether term start and end character indices are    False
-                        recorded in the token    
+                        recorded in the token
 bool   boosts           Whether per-term boosts are recorded in the token   False
 bool   removestops      Whether stop-words should be removed from the       True
                         token stream

docs/source/api/api.rst

 .. toctree::
     :glob:
     :maxdepth: 1
-    
+
     **

docs/source/api/formats.rst

 ==========
 
 .. autoclass:: Format
-	:members:
-	
+    :members:
+
 
 Formats
 =======

docs/source/api/lang/wordnet.rst

 =========
 
 .. autoclass:: Thesaurus
-	:members:
+    :members:
 
 
 Low-level functions

docs/source/api/matching.rst

 
 .. autoclass:: Matcher
     :members:
-    
+
 .. autoclass:: NullMatcher
 .. autoclass:: ListMatcher
 .. autoclass:: WrappingMatcher

docs/source/api/reading.rst

 
 .. autoclass:: IndexReader
     :members:
-    
+
 .. autoclass:: MultiReader
 
 

docs/source/api/searching.rst

 
 .. autoclass:: Collector
     :members:
-    
+
 
 Results classes
 ===============
     :members:
 
 .. autoclass:: ResultsPage
-	:members:
+    :members:
 
-
-

docs/source/api/store.rst

 =======
 
 .. autoclass:: Storage
-	:members:
+    :members:
 
 
 Exceptions

docs/source/api/support/bitvector.rst

 .. automodule:: whoosh.support.bitvector
 
 .. autoclass:: BitVector
-	:members:
+    :members:
 

docs/source/api/util.rst

 ===============
 
 .. automodule:: whoosh.util
-	:members:
+    :members:
 

docs/source/api/writing.rst

 .. autoclass:: AsyncWriter
     :members:
 
-    
+
 Posting writer
 ==============
 

docs/source/batch.rst

     stem_ana.cachesize = -1
     # Reset the analyzer to pick up the changed attribute
     stem_ana.clear()
-    
+
     # Use the writer to index documents...
 
 
 indexing considerably::
 
     from whoosh import index
-    
+
     ix = index.open_dir("indexdir")
     writer = ix.writer(limitmb=256)
 
 ``multiprocessing`` module)::
 
     from whoosh import index
-    
+
     ix = index.open_dir("indexdir")
     writer = ix.writer(procs=4)
-    
+
 Note that when you use multiprocessing, the ``limitmb`` parameter controls the
 amount of memory used by *each process*, so the actual memory used will be
 ``limitmb * procs``::

docs/source/dates.rst

 field in ``add_document()``, you use a Python ``datetime.datetime`` object::
 
     from datetime import datetime, timedelta from whoosh import fields, index
-    
+
     schema = fields.Schema(title=fields.TEXT, content=fields.TEXT,
                            date=fields.DATETIME)
     ix = index.create_in("indexdir", schema)
-    
+
     w = ix.writer()
     w.add_document(title="Document 1", content="Rendering images from the command line",
                    date=datetime.utcnow())
     from whoosh import index
     from whoosh.qparser import QueryParser
     from whoosh.qparser.dateparse import DateParserPlugin
-    
+
     ix = index.open_dir("indexdir")
-    
+
     # Instatiate a query parser
     qp = QueryParser("content", ix.schema)
-    
+
     # Add the DateParserPlugin to the parser
     qp.add_plugin(DateParserPlugin())
-    
+
 With the ``DateParserPlugin``, users can use date queries such as::
 
     20050912
 
     from whoosh import index
     from whoosh.qparser import QueryParser
-    
+
     ix = index.open_dir("indexdir")
     qp = QueryParser("content", schema=ix.schema)
-    
+
     # Find all datetimes in 2005
     q = qp.parse(u"date:2005")
-    
+
     # Find all datetimes on June 24, 2005
     q = qp.parse(u"date:20050624")
-    
+
     # Find all datetimes from 1am-2am on June 24, 2005
     q = qp.parse(u"date:2005062401")
-    
+
     # Find all datetimes from Jan 1, 2005 to June 2, 2010
     q = qp.parse(u"date:[20050101 to 20100602]")
 
     def add_error(msg):
         errors.append(msg)
     qp.add_plugin(DateParserPlug(callback=add_error))
-    
+
     q = qp.parse(u"date:blarg")
     # errors == [u"blarg"]
 

docs/source/facets.rst

     for title in titles:
         w.add_document(title=title, sort_title=title)
     w.commit()
-    
+
     # ...
-    
+
     results = my_searcher.search(my_query, sortedby="sort_title")
 
 Using a separate field for sorting allows you to "massage" the sort values,
 before lowercase letters) and remove spaces to prevent them from affecting the
 sort order::
 
-   for title in titles:
-      sort_title = title.lower().replace(" ", "")
-      w.add_document(title=title, sort_title=sort_title)
+    for title in titles:
+        sort_title = title.lower().replace(" ", "")
+        w.add_document(title=title, sort_title=sort_title)
 
 Alternatively, you can store the field contents and use a
 :class:`whoosh.sorting.StoredFieldFacet` to sort by the stored value. This
 sorting by an indexed field, and doesn't give you the chance to massage the
 sort values::
 
-   schema = fiels.Schema(title=fields.TEXT(stored=True))
-   
-   # ...
-   
-   for title in titles:
-      w.add_document(title=title)
-  
-   # ...
-   
-   sff = sorting.StoredFieldFacet("title")
-   results = my_searcher.search(my_query, sortedby=sff)
-   
+    schema = fiels.Schema(title=fields.TEXT(stored=True))
+
+    # ...
+
+    for title in titles:
+        w.add_document(title=title)
+
+    # ...
+
+    sff = sorting.StoredFieldFacet("title")
+    results = my_searcher.search(my_query, sortedby=sff)
+
 
 The sortedby keyword argument
 -----------------------------
 TBD.
 
 
-
-
-
-
-
-
-
-
-
-
-
-
-

docs/source/fieldcaches.rst

 a storage object and pass it to the ``storage`` keyword argument::
 
     from whoosh.filedb.filestore import FileStorage
-    
+
     mystorage = FileStorage("path/to/cachedir")
     reader.set_caching_policy(storage=mystorage)
-    
+
 
 Creating a custom caching policy
 ================================

docs/source/glossary.rst

 
 .. glossary::
 
-	Analysis
-	    The process of breaking the text of a field into individual *terms*
-	    to be indexed. This consists of tokenizing the text into terms, and then optionally
-	    filtering the tokenized terms (for example, lowercasing and removing *stop words*).
-	    Whoosh includes several different analyzers.
+    Analysis
+        The process of breaking the text of a field into individual *terms*
+        to be indexed. This consists of tokenizing the text into terms, and then optionally
+        filtering the tokenized terms (for example, lowercasing and removing *stop words*).
+        Whoosh includes several different analyzers.
 
-	Corpus
-	    The set of documents you are indexing.
+    Corpus
+        The set of documents you are indexing.
 
-	Documents
-	    The individual pieces of content you want to make searchable.
-	    The word "documents" might imply files, but the data source could really be
-	    anything -- articles in a content management system, blog posts in a blogging
-	    system, chunks of a very large file, rows returned from an SQL query, individual
-	    email messages from a mailbox file, or whatever. When you get search results
-	    from Whoosh, the results are a list of documents, whatever "documents" means in
-	    your search engine.
-	    
-	Fields
-	    Each document contains a set of fields. Typical fields might be "title", "content",
-	    "url", "keywords", "status", "date", etc. Fields can be indexed (so they're
-	    searchable) and/or stored with the document. Storing the field makes it available
-	    in search results. For example, you typically want to store the "title" field so
-	    your search results can display it.
-	   
-	Forward index:
-		A table listing every document and the words that appear in the document.
-		Whoosh lets you store *term vectors* that are a kind of forward index.
-	
-	Indexing
-		The process of examining documents in the corpus and adding them to the
-		*reverse index*.
-	
-	Postings
-		The *reverse index* lists every word in the corpus, and for each word, a list
-		of documents in which that word appears, along with some optional information
-		(such as the number of times the word appears in that document). These items
-		in the list, containing a document number and any extra information, are
-		called *postings*. In Whoosh the information stored in postings is customizable
-		for each *field*.
-	
-	Reverse index
-	    Basically a table listing every word in the corpus, and for each word, the
-	    list of documents in which it appears. It can be more complicated (the index can
-	    also list how many times the word appears in each document, the positions at which
-	    it appears, etc.) but that's how it basically works.
-	    
-	Schema
-		Whoosh requires that you specify the *fields* of the index before you begin
-		indexing. The Schema associates field names with metadata about the field, such
-		as the format of the *postings* and whether the contents of the field are stored
-		in the index.
-		
-	Term vector
-		A *forward index* for a certain field in a certain document. You can specify
-		in the Schema that a given field should store term vectors.
+    Documents
+        The individual pieces of content you want to make searchable.
+        The word "documents" might imply files, but the data source could really be
+        anything -- articles in a content management system, blog posts in a blogging
+        system, chunks of a very large file, rows returned from an SQL query, individual
+        email messages from a mailbox file, or whatever. When you get search results
+        from Whoosh, the results are a list of documents, whatever "documents" means in
+        your search engine.
 
+    Fields
+        Each document contains a set of fields. Typical fields might be "title", "content",
+        "url", "keywords", "status", "date", etc. Fields can be indexed (so they're
+        searchable) and/or stored with the document. Storing the field makes it available
+        in search results. For example, you typically want to store the "title" field so
+        your search results can display it.
 
+    Forward index:
+        A table listing every document and the words that appear in the document.
+        Whoosh lets you store *term vectors* that are a kind of forward index.
 
+    Indexing
+        The process of examining documents in the corpus and adding them to the
+        *reverse index*.
 
+    Postings
+        The *reverse index* lists every word in the corpus, and for each word, a list
+        of documents in which that word appears, along with some optional information
+        (such as the number of times the word appears in that document). These items
+        in the list, containing a document number and any extra information, are
+        called *postings*. In Whoosh the information stored in postings is customizable
+        for each *field*.
 
+    Reverse index
+        Basically a table listing every word in the corpus, and for each word, the
+        list of documents in which it appears. It can be more complicated (the index can
+        also list how many times the word appears in each document, the positions at which
+        it appears, etc.) but that's how it basically works.
 
+    Schema
+        Whoosh requires that you specify the *fields* of the index before you begin
+        indexing. The Schema associates field names with metadata about the field, such
+        as the format of the *postings* and whether the contents of the field are stored
+        in the index.
 
+    Term vector
+        A *forward index* for a certain field in a certain document. You can specify
+        in the Schema that a given field should store term vectors.
 

docs/source/highlight.rst

     results = mysearcher.search(myquery)
     for hit in results:
         print(hit["title"])
-        
+
         # Assume the "path" stored field contains a path to the original file
         with open(hit["path"]) as fileobj:
             filecontents = fileobj.read()
-        
+
         print(hit.highlights("content", text=filecontents))
 
 
 
     # Allow larger fragments
     results.formatter.maxchars = 300
-    
+
     # Show more context before and after
     results.formatter.surround = 50
 
         """Gives higher scores to fragments where the matched terms are close
         together.
         """
-        
+
         # Since lower values are better in this case, we need to negate the
         # value
         return 0 - stddev([t.pos for t in fragment.matched])
             # Use the get_text function to get the text corresponding to the
             # token
             tokentext = highlight.get_text(text, token)
-            
+
             # Return the text as you want it to appear in the highlighted
             # string
             return "[%s]" % tokentext
 you change the ``fragmenter``, ``scorer``, ``order``, and/or ``formatter``::
 
     hi = highlight.Highlighter(fragmenter=my_cf, scorer=sds)
-                               
+
 You can then use the :meth:`whoosh.highlight.Highlighter.highlight_hit` method
 to get highlights for a Hit object::
 

docs/source/index.rst

 
 .. toctree::
     :maxdepth: 2
-    
+
     releases/index
     quickstart
     intro

docs/source/indexing.rst

 
 These are convenience methods for::
 
-	from whoosh.filedb.filestore import FileStorage
-	storage = FileStorage("indexdir")
-	
-	# Create an index
-	ix = storage.create_index(schema)
-	
-	# Open an existing index
-	storage.open_index()
+    from whoosh.filedb.filestore import FileStorage
+    storage = FileStorage("indexdir")
+
+    # Create an index
+    ix = storage.create_index(schema)
+
+    # Open an existing index
+    storage.open_index()
 
 The schema you created the index with is pickled and stored with the index.
 
 You can keep multiple indexes in the same directory using the indexname keyword
 argument::
 
-	# Using the convenience functions
+    # Using the convenience functions
     ix = index.create_in("indexdir", schema=schema, indexname="usages")
     ix = index.open_dir("indexdir", indexname="usages")
-    
+
     # Using the Storage object
     ix = storage.create_index(schema, indexname="usages")
     ix = storage.open_index(indexname="usages")
 Creating a writer locks the index for writing, so only one thread/process at
 a time can have a writer open.
 
-.. NOTE::
+.. note::
+
     Because opening a writer locks the index for writing, in a multi-threaded
     or multi-process environment your code needs to be aware than opening a
     writer may raise an exception (``whoosh.store.LockError``) if a writer is
     :class:`whoosh.writing.BufferedWriter`) of ways to work around the write
     lock.
 
-.. NOTE::
+.. note::
+
     While the writer is open and during the commit, the index is still
     available for reading. Existing readers are unaffected and new readers can
     open the current index normally. Once the commit is finished, existing
 
   * If any of the files no longer exist, delete the corresponding document from
     the index.
-  
+
   * If the file still exists, but has been modified, add it to the list of paths
     to be re-indexed.
-  
+
   * If the file exists, whether it's been modified or not, add it to the list of
     all indexed paths.
-  
+
 * Loops through all the paths of the files on disk.
 
   * If a path is not in the set of all indexed paths, the file is new and we
     need to index it.
-  
+
   * If a path is in the set of paths to re-index, we need to index it.
-  
+
   * Otherwise, we can skip indexing the file.

docs/source/intro.rst

 
 
 .. [1] It would of course be possible to build a turnkey search engine on top of Whoosh,
-	like Nutch and Solr use Lucene.
+       like Nutch and Solr use Lucene.
 
 
 What can Whoosh do for you?

docs/source/keywords.rst

   Use the :meth:`~whoosh.searching.Results.key_terms` method of the
   :class:`whoosh.searching.Results` object to extract keywords from the top N
   documents of the result set.
-    
+
   For example, to extract *five* key terms from the ``content`` field of the top
   *ten* documents of a results object::
-    
+
         keywords = [keyword for keyword, score
                     in results.key_terms("content", docs=10, numterms=5)
 
   :meth:`~whoosh.searching.Searcher.document_number` methods of the
   :class:`whoosh.searching.Searcher` object to get the document numbers for the
   document(s) you want to extract keywords from.
-    
+
   Use the :meth:`~whoosh.searching.Searcher.key_terms` method of a
   :class:`whoosh.searching.Searcher` to extract the keywords, given the list of
   document numbers.
-    
+
   For example, let's say you have an index of emails. To extract key terms from
   the ``content`` field of emails whose ``emailto`` field contains
   ``matt@whoosh.ca``::
-    
+
         searcher = email_index.searcher()
         docnums = searcher.document_numbers(emailto=u"matt@whoosh.ca")
         keywords = [keyword for keyword, score
 
   Use the :meth:`~whoosh.searching.Searcher.key_terms_from_text` method of a
   :class:`whoosh.searching.Searcher` to extract the keywords, given the text::
-  
+
         searcher = email_index.searcher()
         keywords = [keyword for keyword, score
                     in searcher.key_terms_from_text("body", mytext)]
 The ``ExpansionModel`` subclasses in the :mod:`whoosh.classify` module implement
 different weighting functions for key words. These models are translated into
 Python from original Java implementations in Terrier.
-    
 

docs/source/nested.rst

     # First, we need a query that matches all the documents in the "parent"
     # level we want of the hierarchy
     all_parents = query.Term("kind", "class")
-    
+
     # Then, we need a query that matches the children we want to find
     wanted_kids = query.Term("name", "close")
-    
+
     # Now we can make a query that will match documents where "name" is
     # "close", but the query will return the "parent" documents of the matching
     # children
     # Query that matches all documents in the "parent" level we want to match
     # at
     all_parents = query.Term("kind", "album")
-    
+
     # Parent documents we want to match
     wanted_parents = query.Term("album_title", "heaven")
-    
+
     # Now we can make a query that will match parent documents where "album_title"
     # contains "heaven", but the query will return the "child" documents of the
     # matching parents
         w.add_document(kind="method", m_name="add document", parent="Index")
         w.add_document(kind="method", m_name="add reader", parent="Index")
         w.add_document(kind="method", m_name="close", parent="Index")
-        
+
         w.add_document(kind="class", c_name="Accumulator", docstring="...")
         w.add_document(kind="method", m_name="add", parent="Accumulator")
         w.add_document(kind="method", m_name="get result", parent="Accumulator")
-        
+
         w.add_document(kind="class", c_name="Calculator", docstring="...")
         w.add_document(kind="method", m_name="add", parent="Calculator")
         w.add_document(kind="method", m_name="add all", parent="Calculator")
         w.add_document(kind="method", m_name="add some", parent="Calculator")
         w.add_document(kind="method", m_name="multiply", parent="Calculator")
         w.add_document(kind="method", m_name="close", parent="Calculator")
-        
+
         w.add_document(kind="class", c_name="Deleter", docstring="...")
         w.add_document(kind="method", m_name="add", parent="Deleter")
         w.add_document(kind="method", m_name="delete", parent="Deleter")
     with ix.searcher() as s:
         # Tip: Searcher.document() and Searcher.documents() let you look up
         # documents by field values more easily than using Searcher.search()
-    
+
         # Children to parents:
         # Print the docstrings of classes on which "close" methods occur
         for child_doc in s.documents(m_name="close"):
             parent_doc = s.document(c_name=child_doc["parent"])
             # Print the parent document's stored docstring field
             print(parent_doc["docstring"])
-        
+
         # Parents to children:
         # Find classes with "big" in the docstring and print their methods
         q = query.Term("kind", "class") & query.Term("docstring", "big")

docs/source/parsing.rst

 .. code-block:: none
 
     rendering shading
-    
+
 might be parsed into query objects like this::
 
     And([Term("content", u"rendering"), Term("content", u"shading")])
 The new hand-written parser is less brittle and more flexible.)
 
 .. note::
-    
+
     Remember that you can directly create query objects programmatically using
     the objects in the :mod:`whoosh.query` module. If you are not processing
     actual user queries, this is preferable to building a query string just to
     from whoosh.qparser import QueryParser
 
     parser = QueryParser("content", schema=myindex.schema)
-    
+
 .. tip::
 
     You can instantiate a QueryParser object without specifying a schema,
 If the user doesn't explicitly specify ``AND`` or ``OR`` clauses::
 
     physically based rendering
-    
+
 ...by default, the parser treats the words as if they were connected by ``AND``,
 meaning all the terms must be present for a document to match::
 
     physically AND based AND rendering
-    
+
 To change the parser to use ``OR`` instead, so that any of the terms may be
 present for a document to match, i.e.::
 
     physically OR based OR rendering
-    
+
 ...configure the QueryParser using the ``group`` keyword argument like this::
 
     from whoosh import qparser
-    
+
     parser = qparser.QueryParser(fieldname, schema=myindex.schema,
                                  group=qparser.OrGroup)
 
 for example if you created the object with::
 
     parser = QueryParser("content", schema=myschema)
-    
+
 And the user entered the query:
 
 .. code-block:: none
 
     three blind mice
-    
+
 The parser would treat it as:
 
 .. code-block:: none
     from whoosh.qparser import MultifieldParser
 
     mparser = MultifieldParser(["title", "content"], schema=myschema)
-    
+
 When this MultifieldParser instance parses ``three blind mice``, it treats it
 as:
 
 Once you have a parser::
 
     parser = qparser.QueryParser("content", schema=myschema)
-    
+
 you can remove features from it using the
 :meth:`~whoosh.qparser.QueryParser.remove_plugin_class` method.
 
 For example, to remove the ability of the user to specify fields to search::
 
     parser.remove_plugin_class(qparser.FieldsPlugin)
-    
+
 To remove the ability to search for wildcards, which can be harmful to query
 performance::
 
     parser.remove_plugin_class(qparser.WildcardPlugin)
-    
+
 See :doc:`/api/qparser` for information about the plugins included with .
 
 
     # Use Spanish equivalents instead of AND and OR
     cp = qparser.CompoundsPlugin(And=" Y ", Or=" O ")
     parser.replace_plugin(cp)
-    
+
 The :class:`whoosh.qparser.NotPlugin` implements the ability to logically NOT
 subqueries. You can instantiate a new ``NotPlugin`` object with a different
 token::
 this::
 
     field:>apple
-    
+
 The plugin lets you use ``>``, ``<``, ``>=``, ``<=``, ``=>``, or ``=<`` after
 a field specifier, and translates the expression into the equivalent range::
 
     date:>='31 march 2001'
-    
+
     date:[31 march 2001 to]
 
 
     The query class to use to join sub-queries when the user doesn't explicitly
     specify a boolean operator, such as ``AND`` or ``OR``. This lets you change
     the default operator from ``AND`` to ``OR``.
-    
+
     This will be the :class:`whoosh.qparser.AndGroup` or
     :class:`whoosh.qparser.OrGroup` class (*not* an instantiated object) unless
     you've written your own custom grouping syntax you want to use.
-    
+
 termclass
     The query class to use to wrap single terms.
-    
+
     This must be a :class:`whoosh.query.Query` subclass (*not* an instantiated
     object) that accepts a fieldname string and term text unicode string in its
     ``__init__`` method. The default is :class:`whoosh.query.Term`.
 * Create a new :class:`whoosh.qparser.syntax.GroupNode` subclass to hold
   nodes affected by your operator. This object is responsible for generating
   a :class:`whoosh.query.Query` object corresponding to the syntax.
-  
+
 * Create a regular expression pattern for the operator's query syntax.
 
 * Create an ``OperatorsPlugin.OpTagger`` object from the above information.
 
 * Create a new ``OperatorsPlugin`` instance configured with your custom
   operator(s).
-  
+
 * Replace the default ``OperatorsPlugin`` in your parser with your new instance.
 
 For example, if you were creating a ``BEFORE`` operator::
 
     optype = qparser.InfixOperator
     pattern = " BEFORE "
-    
+
     class BeforeGroup(qparser.GroupNode):
         merging = True
         qclass = query.Ordered
 
 Create an OpTagger for your operator::
-    
+
     btagger = qparser.OperatorPlugin.OpTagger(pattern, BeforeGroup,
                                               qparser.InfixOperator)
 
 Note that the list of operators you specify with the first argument is IN
 ADDITION TO the default operators (AND, OR, etc.). To turn off one of the
 default operators, you can pass None to the corresponding keyword argument::
-        
+
     cp = qparser.OperatorsPlugin([(optagger, 0)], And=None)
 
 If you want ONLY your list of operators and none of the default operators,

docs/source/querylang.rst

 Find documents containing ``render`` but *not* modeling::
 
     render NOT modeling
-    
+
 Find documents containing ``alpha`` but not either ``beta`` or ``gamma``::
 
     alpha NOT (beta OR gamma)
 insert one, by default AND. So this query::
 
     render shading modeling
-    
+
 is equivalent (by default) to::
 
     render AND shading AND modeling
 Find the term ``ivan`` in the ``name`` field::
 
     name:ivan
-    
+
 The ``field:`` prefix only sets the field for the term it directly precedes, so
 the query::
-    
+
     title:open sesame
-        
+
 Will search for ``open`` in the ``title`` field and ``sesame`` in the *default*
 field.
 
 To apply a field prefix to multiple terms, group them with parentheses::
 
     title:(open sesame)
-    
+
 This is the same as::
 
     title:open title:sesame
-    
+
 Of course you can specify a field for phrases too::
 
     title:"open sesame"
 and ``*`` to represent any number of characters) to match terms::
 
     te?t test* *b?g*``
-    
+
 Note that a wildcard starting with ``?`` or ``*`` is very slow. Note also that
 these wildcards only match *individual terms*. For example, the query::
 
     my*life
-    
+
 will **not** match an indexed phrase like::
 
     my so called life
-    
+
 because those are four separate terms.
 
 
 themselves). You can specify that one or both ends of the range are *exclusive*
 by using the ``{`` and/or ``}`` characters::
 
-	[0000 TO 0025}
-	{prefix TO suffix}
+    [0000 TO 0025}
+    {prefix TO suffix}
 
 You can also specify *open-ended* ranges by leaving out the start or end term::
 
-	[0025 TO]
-	{TO suffix}
+    [0025 TO]
+    {TO suffix}
 
 
 Boosting query elements
 important::
 
     ninja^2 cowboy bear^0.5
-    
+
 You can apply a boost to several terms using grouping parentheses::
 
     (open sesame)^2.5 roc

docs/source/quickstart.rst

 ...     query = QueryParser("content", ix.schema).parse("first")
 ...     results = searcher.search(query)
 ...     results[0]
-... 
+...
 {"title": u"First document", "path": u"/a"}
 
 
 
 This schema has two fields, "title" and "content"::
 
-	from whoosh.fields import Schema, TEXT
-	
-	schema = Schema(title=TEXT, content=TEXT)
+    from whoosh.fields import Schema, TEXT
+
+    schema = Schema(title=TEXT, content=TEXT)
 
 You only need to do create the schema once, when you create the index. The
 schema is pickled and stored with the index.
     field as a single unit (that is, it doesn't break it up into individual
     words). This is useful for fields such as a file path, URL, date, category,
     etc.
-    
+
 :class:`whoosh.fields.STORED`
     This field is stored with the document, but not indexed. This field type is
     not indexed and not searchable. This is useful for document information you
     want to display to the user in the search results.
-    
+
 :class:`whoosh.fields.KEYWORD`
     This type is designed for space- or comma-separated keywords. This type is
     indexed and searchable (and optionally stored). To save space, it does not
     support phrase searching.
-    
+
 :class:`whoosh.fields.TEXT`
     This type is for body text. It indexes (and optionally stores) the text and
     stores term positions to allow phrase searching.
 
 :class:`whoosh.fields.NUMERIC`
     This type is for numbers. You can store integers or floating point numbers.
-    
+
 :class:`whoosh.fields.BOOLEAN`
     This type is for boolean (true/false) values.
 
 Once you have the schema, you can create an index using the ``create_in``
 function::
 
-	import os.path
-	from whoosh.index import create_in
-	
-	if not os.path.exists("index"):
+    import os.path
+    from whoosh.index import create_in
+
+    if not os.path.exists("index"):
         os.mkdir("index")
-	ix = create_in("index", schema)
+    ix = create_in("index", schema)
 
 (At a low level, this creates a *Storage* object to contain the index. A
 ``Storage`` object represents that medium in which the index will be stored.
 After you've created an index, you can open it using the ``open_dir``
 convenience function::
 
-	from whoosh.index import open_dir
-	
-	ix = open_dir("index")
-	
+    from whoosh.index import open_dir
+
+    ix = open_dir("index")
+
 
 The ``IndexWriter`` object
 ==========================
 
 Calling commit() on the ``IndexWriter`` saves the added documents to the index::
 
-	writer.commit()
+    writer.commit()
 
 See :doc:`indexing` for more information.
 
 
     with ix.searcher() as searcher:
         ...
-        
+
 This is of course equivalent to::
 
     try:
 "bear" in the "content" field::
 
     # Construct query objects directly
-    
+
     from whoosh.query import *
     myquery = And([Term("content", u"apple"), Term("content", "bear")])
 
 argument is a schema to use to understand how to parse the fields::
 
     # Parse a query string
-    
+
     from whoosh.qparser import QueryParser
     parser = QueryParser("content", ix.schema)
     myquery = parser.parse(querystring)
-    
+
 Once you have a ``Searcher`` and a query object, you can use the ``Searcher``'s
 ``search()`` method to run the query and get a ``Results`` object::
 
 
 See :doc:`searching` for more information.
 
-
-
-

docs/source/recipes.rst

                    if low != t.text:
                        t.text = low
                        yield t
-    
+
     ana = analysis.RegexTokenizer() | CaseSensitivizer()
     [t.text for t in ana("The new SuperTurbo 5000", mode="index")]
     # ["The", "the", "new", "SuperTurbo", "superturbo", "5000"]
-    
+
 
 Searching
 =========
 
     # Single document (unique field value)
     stored_fields = searcher.document(id="bacon")
-    
+
     # Multiple documents
     for stored_fields in searcher.documents(tag="cake"):
         ...
     def pos_score_fn(searcher, fieldname, text, matcher):
         poses = matcher.value_as("positions")
         return 1.0 / (poses[0] + 1)
-        
+
     pos_weighting = scoring.FunctionWeighting(pos_score_fn)
-    searcher = myindex.searcher(weighting=pos_weighting)    
+    searcher = myindex.searcher(weighting=pos_weighting)
 
 
 Results
     else:
         low = results.estimated_min_length()
         high = results.estimated_length()
-    
+
         print("Scored", found, "of between", low, "and", "high", "documents")
 
 
     for hit in results:
         # Which terms matched in this hit?
         print("Matched:", hit.matched_terms())
-        
+
         # Which terms from the query didn't match in this hit?
         print("Didn't match:", myquery.all_terms() - hit.matched_terms())
 
 
     # Including documents that are deleted but not yet optimized away
     numdocs = searcher.doc_count_all()
-    
+
     # Not including deleted documents
     numdocs = searcher.doc_count()
 
 
     # Number of times content:wobble appears in all documents
     freq = searcher.frequency("content", "wobble")
-    
+
     # Number of documents containing content:wobble
     docfreq = searcher.doc_frequency("content", "wobble")
 
     postings = searcher.postings("content", "wobble")
     postings.skip_to(500)
     return postings.id() == 500
-    
+
     # ...or the slower but easier way
     docset = set(searcher.postings("content", "wobble").all_ids())
     return 500 in docset
     vector = searcher.vector(500, "content")
     vector.skip_to("wobble")
     return vector.id() == "wobble"
-    
+
     # ...or the slower but easier way
     wordset = set(searcher.vector(500, "content").all_ids())
     return "wobble" in wordset
-    
-    
+

docs/source/releases/0_3.rst

 
 * Added experimental DATETIME field type lets you pass a
   ``datetime.datetime`` object as a field value to ``add_document``::
-  
+
     from whoosh.fields import Schema, ID, DATETIME
     from whoosh.filedb.filestore import RamStorage
     from datetime import datetime
-  
+
     schema = Schema(id=ID, date=DATETIME)
     storage = RamStorage()
     ix = storage.create_index(schema)
     w = ix.writer()
     w.add_document(id=u"A", date=datetime.now())
     w.close()
-  
+
   Internally, the DATETIME field indexes the datetime object as text using
   the format (4 digit year + 2 digit month + 2 digit day + 'T' + 2 digit hour +
   2 digit minute + 2 digit second + 6 digit microsecond), for example

docs/source/releases/1_0.rst

 
 Whoosh 1.8.2 fixes some bugs, including a mistyped signature in
 Searcher.more_like and a bad bug in Collector that could screw up the
-ordering of results given certain parameters. 
+ordering of results given certain parameters.
 
 
 Whoosh 1.8.1
     sorter.add_field("price", reverse=True)
     # Get the Results
     results = sorter.sort_query(myquery)
-    
+
 See the documentation for the :class:`~whoosh.sorting.Sorter` class for more
 information. Bear in mind that complex sorts will be much slower on large
 indexes because they can't use the per-segment field caches.
 
     # Search within previous results
     newresults = searcher.search(newquery, filter=oldresults)
-    
+
     # Search within the "basics" chapter
     results = searcher.search(userquery, filter=query.Term("chapter", "basics"))
 
 documents that have been flushed to disk::
 
     writer = writing.BufferedWriter(myindex)
-    
+
     # You can update (replace) documents in RAM without having to commit them
     # to disk
     writer.add_document(path="/a", text="Hi there")
     writer.update_document(path="/a", text="Hello there")
-    
+
     # Search committed and uncommited documents by getting a searcher from the
     # writer instead of the index
     searcher = writer.searcher()
         path = STORED
         tags = KEYWORD(stored=True)
         content = TEXT
-        
+
     index.create_in("indexdir", MySchema)
 
 Whoosh 1.6.2: Added :class:`whoosh.searching.TermTrackingCollector` which tracks
 ==========
 
 Whoosh 1.3 adds a more efficient DATETIME field based on the new tiered NUMERIC
-field, and the DateParserPlugin. See :doc:`../dates`. 
+field, and the DateParserPlugin. See :doc:`../dates`.
 
 
 Whoosh 1.2
     for id in my_list_of_ids_to_delete:
         myindex.delete_by_term("id", id)
     myindex.commit()
-        
+
     # Instead do this
     writer = myindex.writer()
     for id in my_list_of_ids_to_delete:
 
     # Do not merge segments
     writer.commit(merge=False)
-    
+
     # or
-    
+
     # Merge all segments
     writer.commit(optimize=True)
 
 
 Custom Weighting implementations that use the ``final()`` method must now
 set the ``use_final`` attribute to ``True``::
-  
-  	from whoosh.scoring import BM25F
-  
-  	class MyWeighting(BM25F):
-  		use_final = True
-  		
-  		def final(searcher, docnum, score):
-  			return score + docnum * 10
-  			
+
+    from whoosh.scoring import BM25F
+
+    class MyWeighting(BM25F):
+        use_final = True
+
+        def final(searcher, docnum, score):
+            return score + docnum * 10
+
 This disables the new optimizations, forcing Whoosh to score every matching
 document.
 

docs/source/releases/2_0.rst

   are much more flexible than the previous field-based system.
 
   For example, to sort by first name and then score::
-        
+
       from whoosh import sorting
-       
+
       mf = sorting.MultiFacet([sorting.FieldFacet("firstname"),
                                sorting.ScoreFacet()])
       results = searcher.search(myquery, sortedby=mf)
 * Completely revamped spell-checking to make it much faster, easier, and more
   flexible. You can enable generation of the graph files use by spell checking
   using the ``spelling=True`` argument to a field type::
-  
+
       schema = fields.Schema(text=fields.TEXT(spelling=True))
-  
+
   (Spelling suggestion methods will work on fields without ``spelling=True``
   but will slower.) The spelling graph will be updated automatically as new
   documents are added -- it is no longer necessary to maintain a separate
 
   You can get suggestions for individual words using
   :meth:`whoosh.searching.Searcher.suggest`::
-  
+
       suglist = searcher.suggest("content", "werd", limit=3)
 
   Whoosh now includes convenience methods to spell-check and correct user
   queries, with optional highlighting of corrections using the
   ``whoosh.highlight`` module::
-  
+
       from whoosh import highlight, qparser
-  
+
       # User query string
       qstring = request.get("q")
-      
+
       # Parse into query object
       parser = qparser.QueryParser("content", myindex.schema)
       qobject = parser.parse(qstring)
-      
+
       results = searcher.search(qobject)
-      
+
       if not results:
         correction = searcher.correct_query(gobject, gstring)
         # correction.query = corrected query object
         # correction.string = corrected query string
-        
+
         # Format the corrected query string with HTML highlighting
         cstring = correction.format_string(highlight.HtmlFormatter())
-  
+
   Spelling suggestions can come from field contents and/or lists of words.
   For stemmed fields the spelling suggestions automatically use the unstemmed
   forms of the words.

docs/source/releases/index.rst

 
 .. toctree::
     :maxdepth: 2
-    
+
     2_0
     1_0
     0_3
-    
+

docs/source/schema.rst

 :class:`whoosh.fields.NUMERIC`
     This field stores int, long, or floating point numbers in a compact,
     sortable format.
-    
+
 :class:`whoosh.fields.DATETIME`
     This field stores datetime objects in a compact, sortable format.
-    
+
 :class:`whoosh.fields.BOOLEAN`
     This simple filed indexes boolean values and allows users to search for
     ``yes``, ``no``, ``true``, ``false``, ``1``, ``0``, ``t`` or ``f``.
 more than one field, it's much more efficient to create the writer yourself::
 
     ix.add_field("fieldname", fields.KEYWORD)
-    
+
 In the ``filedb`` backend, removing a field simply removes that field from the
 _schema_ -- the index will not get smaller, data about that field will remain
 in the index until you optimize. Optimizing will compact the index, removing
     writer.delete_field("path")
     # Don't do this!!!
     writer.add_field("path", fields.KEYWORD)
-    
+
 (A future version of Whoosh may automatically prevent this error.)
 
 
 format       fields.Format   Defines what kind of information a field records
                              about each term, and how the information is stored
                              on disk.
-vector       fields.Format   Optional: if defined, the format in which to store         
+vector       fields.Format   Optional: if defined, the format in which to store
                              per-document forward-index information for this field.
 scorable     bool            If True, the length of (number of terms in)the field in
                              each document is stored in the index. Slightly misnamed,
                              in the index.
 unique       bool            If True, the value of this field may be used to
                              replace documents with the same value when the user
-                             calls 
+                             calls
                              :meth:`~whoosh.writing.IndexWriter.document_update`
                              on an ``IndexWriter``.
 ============ =============== ======================================================

docs/source/searching.rst

 
     with ix.searcher() as searcher:
         ...
-        
+
 This is of course equivalent to::
 
     try:
 :class:`~whoosh.searching.Results` object::
 
     from whoosh.qparser import QueryParser
-    
+
     with myindex.searcher() as s:
         qp = QueryParser("content", schema=myindex.schema)
         q = qp.parse(u"hello world")
-        
+
         results = s.search(q)
 
 By default the results contains at most the first 10 matching documents. To get
 ``search_page`` method lets you conveniently retrieve only the results on a
 given page::
 
-	results = s.search_page(q, 1)
+    results = s.search_page(q, 1)
 
 The default page length is 10 hits. You can use the ``pagelen`` keyword argument
 to set a different page length::
 
-	results = s.search_page(q, 5, pagelen=20)
+    results = s.search_page(q, 5, pagelen=20)
 
 
 Results object
     else:
         low = results.estimated_min_length()
         high = results.estimated_length()
-    
+
         print("Scored", found, "of between", low, "and", "high", "documents")
 
 
     # Get the terms searched for
     termset = set()
     userquery.existing_terms(termset)
-    
+
     # Formulate a "best bet" query for the terms the user
     # searched for in the "content" field
     bbq = Or([Term("bestbet", text) for fieldname, text
 
     # Find documents matching the searched for terms
     results = s.search(bbq, limit=5)
-    
+
     # Find documents that match the original query
     allresults = s.search(userquery, limit=10)
-    
+
     # Add the user query results on to the end of the "best bet"
     # results. If documents appear in both result sets, push them
     # to the top of the combined results.

docs/source/spelling.rst

 To create a :class:`whoosh.spelling.Corrector` object from a word list::
 
     from whoosh.spelling import GraphCorrector
-    
+
     corrector = GraphCorrector.from_word_list(word_list)
-    
+
 Creating a corrector directly from a word list can be slow for large
 word lists, so you can save a corrector's graph to a more efficient
 on-disk form like this::
     # Parse the user query string
     qp = qparser.QueryParser("content", myindex.schema)
     q = qp.parse(qstring)
-    
+
     # Try correcting the query
     with myindex.searcher() as s:
         corrected = s.correct_query(q, qstring)
 as HTML::
 
     from whoosh import highlight
-    
+
     hf = highlight.HtmlFormatter()
     corrected = s.correct_query(q, qstring, formatter=hf)
-     
+
 See the documentation for
 :meth:`whoosh.searching.Searcher.correct_query` for information on the
 defaults and arguments.

docs/source/stemming.rst

 
     from whoosh import fields
     from whoosh.analysis import StemmingAnalyzer
-    
+
     stem_ana = StemmingAnalyzer()
     schema = fields.Schema(title=TEXT(analyzer=stem_ana, stored=True),
                            content=TEXT(analyzer=stem_ana))
 variations of the given word in the index. For example, the query::
 
     query.Variations("content", "rendered")
-    
+
 ...might act like this (depending on what words are in the index)::
 
     query.Or([query.Term("content", "render"), query.Term("content", "rendered"),
 keyword argument to the parser initialization method::
 
     from whoosh import qparser, query
-    
+
     qp = qparser.QueryParser("content", termclass=query.Variations)
 
 Variations has pros and cons.
     from whoosh.analysis import CharsetFilter, StemmingAnalyzer
     from whoosh import fields
     from whoosh.support.charset import accent_map
-    
+
     # For example, to add an accent-folding filter to a stemming analyzer:
     my_analyzer = StemmingAnalyzer | CharsetFilter(accent_map)
-    
+
     # To use this analyzer in your schema:
     my_schema = fields.Schema(content=fields.TEXT(analyzer=my_analyzer))
 

docs/source/tech/backend.rst

 * Indexes must implement the following methods.
 
   * :meth:`whoosh.index.Index.is_empty`
-  
+
   * :meth:`whoosh.index.Index.doc_count`
-    
+
   * :meth:`whoosh.index.Index.reader`
-  
+
   * :meth:`whoosh.index.Index.writer`
 
 * Indexes that require/support locking must implement the following methods.
 
   * :meth:`whoosh.index.Index.lock`
-  
+
   * :meth:`whoosh.index.Index.unlock`
 
 * Indexes that support deletion must implement the following methods.
 
   * :meth:`whoosh.index.Index.delete_document`
-  
+
   * :meth:`whoosh.index.Index.doc_count_all` -- if the backend has delayed
     deletion.
-  
+
 * Indexes that require/support versioning/transactions *may* implement the following methods.
 
   * :meth:`whoosh.index.Index.latest_generation`
 
   * :meth:`whoosh.index.Index.up_to_date`
-  
+
   * :meth:`whoosh.index.Index.last_modified`
-    
+
 * Index *may* implement the following methods (the base class's versions are no-ops).
 
   * :meth:`whoosh.index.Index.optimize`
-  
+
   * :meth:`whoosh.index.Index.close`
-  
+
 
 IndexWriter
 ===========
 * IndexWriters must implement the following methods.
 
   * :meth:`whoosh.writing.IndexWriter.add_document`
-  
+
   * :meth:`whoosh.writing.IndexWriter.add_reader`
-  
+
 * Backends that support deletion must implement the following methods.
 
   * :meth:`whoosh.writing.IndexWriter.delete_document`
-  
+
 * IndexWriters that work as transactions must implement the following methods.
 
   * :meth:`whoosh.reading.IndexWriter.commit` -- Save the additions/deletions done with
     this IndexWriter to the main index, and release any resources used by the IndexWriter.
-  
+
   * :meth:`whoosh.reading.IndexWriter.cancel` -- Throw away any additions/deletions done
     with this IndexWriter, and release any resources used by the IndexWriter.
 
 * IndexReaders must implement the following methods.
 
   * :meth:`whoosh.reading.IndexReader.__contains__`
-  
+
   * :meth:`whoosh.reading.IndexReader.__iter__`
-  
+
   * :meth:`whoosh.reading.IndexReader.iter_from`
-  
+
   * :meth:`whoosh.reading.IndexReader.stored_fields`
-  
+
   * :meth:`whoosh.reading.IndexReader.doc_count_all`
-  
+
   * :meth:`whoosh.reading.IndexReader.doc_count`
-  
+
   * :meth:`whoosh.reading.IndexReader.doc_field_length`
-  
+
   * :meth:`whoosh.reading.IndexReader.field_length`
-  
+
   * :meth:`whoosh.reading.IndexReader.max_field_length`
-  
+
   * :meth:`whoosh.reading.IndexReader.postings`
-  
+
   * :meth:`whoosh.reading.IndexReader.has_vector`
-  
+
   * :meth:`whoosh.reading.IndexReader.vector`
-  
+
   * :meth:`whoosh.reading.IndexReader.doc_frequency`
-  
+
   * :meth:`whoosh.reading.IndexReader.frequency`
-  
+
 * Backends that support deleting documents should implement the following
   methods.
-  
+
   * :meth:`whoosh.reading.IndexReader.has_deletions`
   * :meth:`whoosh.reading.IndexReader.is_deleted`
 
 
 * If the IndexReader object does not keep the schema in the ``self.schema``
   attribute, it needs to override the following methods.
-  
+
   * :meth:`whoosh.reading.IndexReader.field`
-  
+
   * :meth:`whoosh.reading.IndexReader.field_names`
-  
+
   * :meth:`whoosh.reading.IndexReader.scorable_names`
-  
+
   * :meth:`whoosh.reading.IndexReader.vector_names`
-  
+
 * IndexReaders *may* implement the following methods.
-  
+
   * :meth:`whoosh.reading.DocReader.close` -- closes any open resources associated with the
     reader.
 
 * Implement the following methods at minimum.
 
   * :meth:`whoosh.matching.Matcher.is_active`
-  
+
   * :meth:`whoosh.matching.Matcher.copy`
-  
+
   * :meth:`whoosh.matching.Matcher.id`
-  
+
   * :meth:`whoosh.matching.Matcher.next`
-  
+
   * :meth:`whoosh.matching.Matcher.value`
-  
+
   * :meth:`whoosh.matching.Matcher.value_as`
-  
+
   * :meth:`whoosh.matching.Matcher.score`
-  
+
 * Depending on the implementation, you *may* implement the following methods
   more efficiently.
-  
+
   * :meth:`whoosh.matching.Matcher.skip_to`
-  
+
   * :meth:`whoosh.matching.Matcher.weight`
-  
+
 * If the implementation supports quality, you should implement the following
   methods.
-  
+
   * :meth:`whoosh.matching.Matcher.supports_quality`
-  
+
   * :meth:`whoosh.matching.Matcher.quality`
-  
+
   * :meth:`whoosh.matching.Matcher.block_quality`
-  
+
   * :meth:`whoosh.matching.Matcher.skip_to_quality`
-  
-  
-  
-
-
-
-

docs/source/tech/filedb.rst

 <revision_number>.toc
     The "master" file containing information about the index and its segments.
 
-The index directory will contain a set of files for each segment. A segment is like a mini-index -- when you add documents to the index, whoosh creates a new segment and then searches the old segment(s) and the new segment to avoid having to do a big merge every time you add a document. When you get enough small segments whoosh will merge them into larger segments or a single segment. 
+The index directory will contain a set of files for each segment. A segment is like a mini-index -- when you add documents to the index, whoosh creates a new segment and then searches the old segment(s) and the new segment to avoid having to do a big merge every time you add a document. When you get enough small segments whoosh will merge them into larger segments or a single segment.
 
 <segment_number>.dci
-    Contains per-document information (e.g. field lengths). This will grow linearly with the number of documents. 
+    Contains per-document information (e.g. field lengths). This will grow linearly with the number of documents.
 
 <segment_number>.dcz
     Contains the stored fields for each document.

docs/source/tech/index.rst

 .. toctree::
     :glob:
     :maxdepth: 2
-    
+
     *