Commits

Anonymous committed 9226220 Merge

merged

  • Participants
  • Parent commits 3ec7d26, 6456ee7

Comments (0)

Files changed (27)

docs/source/api/query.rst

 Special queries
 ===============
 
+.. autoclass:: NestedParent
+.. autoclass:: NestedChildren
 .. autoclass:: ConstantScoreQuery
 
 

docs/source/index.rst

     fieldcaches
     batch
     threads
+    nested
     recipes
     api/api
     tech/index

docs/source/nested.rst

+===========================================
+Indexing and searching document hierarchies
+===========================================
+
+Overview
+========
+
+Whoosh's full-text index is essentially a flat database of documents. However,
+Whoosh supports two techniques for simulating the indexing and querying of
+hiearchical documents, tha is, sets of documents that form a parent-child
+hierarchy, such as "Chapter - Section - Paragraph" or
+"Module - Class - Method".
+
+You can specify parent-child relationships *at indexing time*, by grouping
+documents in the same hierarchy, and then use the
+:class:`whoosh.query.NestedParent` and/or :class:`whoosh.query.NestedChildren`
+to find parents based on their children or vice-versa.
+
+Alternatively, you can use *query time joins*, essentially like external key
+joins in a database, where you perform one search to find a relevant document,
+then use a stored value on that document (for example, a ``parent`` field) to
+look up another document.
+
+Both methods have pros and cons.
+
+
+Using nested document indexing
+==============================
+
+Indexing
+--------
+
+This method works by indexing a "parent" document and all its "child" documents
+*as a "group"* so they are guaranteed to end up in the same segment. You can
+use the context manager returned by ``IndexWriter.group()`` to group
+documents::
+
+    with ix.writer() as w:
+        with w.group():
+            w.add_document(kind="class", name="Index")
+            w.add_document(kind="method", name="add document")
+            w.add_document(kind="method", name="add reader")
+            w.add_document(kind="method", name="close")
+        with w.group():
+            w.add_document(kind="class", name="Accumulator")
+            w.add_document(kind="method", name="add")
+            w.add_document(kind="method", name="get result")
+        with w.group():
+            w.add_document(kind="class", name="Calculator")
+            w.add_document(kind="method", name="add")
+            w.add_document(kind="method", name="add all")
+            w.add_document(kind="method", name="add some")
+            w.add_document(kind="method", name="multiply")
+            w.add_document(kind="method", name="close")
+        with w.group():
+            w.add_document(kind="class", name="Deleter")
+            w.add_document(kind="method", name="add")
+            w.add_document(kind="method", name="delete")
+
+Alternatively you can use the ``start_group()`` and ``end_group()`` methods::
+
+    with ix.writer() as w:
+        w.start_group()
+        w.add_document(kind="class", name="Index")
+        w.add_document(kind="method", name="add document")
+        w.add_document(kind="method", name="add reader")
+        w.add_document(kind="method", name="close")
+        w.end_group()
+
+Each level of the hierarchy should have a query that distinguishes it from
+other levels (for example, in the above index, you can use ``kind:class`` or
+``kind:method`` to match different levels of the hierarchy).
+
+Once you've indexed the hierarchy of documents, you can use two query types to
+find parents based on children or vice-versa.
+
+(There is currently no support in the default query parser for nested queries.)
+
+
+NestedParent query
+------------------
+
+The :class:`whoosh.query.NestedParent` query type lets you specify a query for
+child documents, but have the query return an "ancestor" document from higher
+in the hierarchy::
+
+    # First, we need a query that matches all the documents in the "parent"
+    # level we want of the hierarchy
+    all_parents = query.Term("kind", "class")
+    
+    # Then, we need a query that matches the children we want to find
+    kids = query.Term("name", "close")
+    
+    # Now we can make a query that will match documents where "name" is
+    # "close", but the query will return the "parent" documents of the matching
+    # children
+    q = query.NestedParent(all_parents, kids)
+    # results = Index, Calculator
+
+Note that in a hierarchy with more than two levels, you can specify a "parents"
+query that matches any level of the hierarchy, so you can return the top-level
+ancestors of the matching children, or the second level, third level, etc.
+
+The query works by first building a bit vector representing which documents are
+"parents"::
+
+     Index
+     |      Calculator
+     |      |
+     1000100100000100
+         |        |
+         |        Deleter
+         Accumulator
+
+Then for each match of the "child" query, it calculates the previous parent
+from the bit vector returns it as a match (and it only returns each parent once
+no matter how many children match). This parent lookup is very efficient::
+
+     1000100100000100
+        |
+     |<-+ close
+
+
+NestedChildren query
+--------------------
+
+The opposite of ``NestedParent`` is :class:`whoosh.query.NestedChildren`. This
+query lets you match parents but return their children. This is useful, for
+example, to search for an album title and return the songs in the album::
+
+    # Query that matches all documents in the "parent" level we want to match
+    # at
+    all_parents = query.Term("kind", "album")
+    
+    # Parent documents we want to match
+    wanted_parents = query.Term("album_title", "heaven")
+    
+    # Now we can make a query that will match parent documents where "name"
+    # contains "heaven", but the query will return the "child" documents of the
+    # matching parents
+    q1 = query.NestedChildren(all_parents, wanted_parents)
+
+You can then combine that query with an ``AND`` clause, for example to find
+songs with "hell" in the song title that occur on albums with "heaven" in the
+album title::
+
+    q2 = query.And([q1, query.Term("song_title", "hell")])
+
+
+Deleting and updating hierarchical documents
+--------------------------------------------
+
+The drawback of the index-time method is *updating and deleting*. Because the
+implementation of the queries depends on the parent and child documents being
+contiguous in the segment, you can't update/delete just one child document.
+You can only update/delete an entire top-level document at once (for example,
+if your hierarchy is "Chapter - Section - Paragraph", you can only update or
+delete entire chapters, not a section or paragraph). If the top-level of the
+hierarchy represents very large blocks of text, this can involve a lot of
+deleting and reindexing.
+
+Currently ``Writer.update_document()`` does not automatically work with nested
+documents. You must manually delete and re-add document groups to update them.
+
+To delete nested document groups, use the ``Writer.delete_by_query()``
+method with a ``NestedParent`` query::
+
+    # Delete the "Accumulator" class
+    all_parents = query.Term("kind", "class")
+    to_delete = query.Term("name", "Accumulator")
+    q = query.NestedParent(all_parents, to_delete)
+    with myindex.writer() as w:
+        w.delete_by_query(q)
+
+
+Using query-time joins
+======================
+
+A second technique for simulating hierarchical documents in Whoosh involves
+using a stored field on each document to point to its parent, and then using
+the value of that field at query time to find parents and children.
+
+For example, if we index a hierarchy of classes and methods using pointers
+to parents instead of nesting::
+
+    # Store a pointer to the parent on each "method" document
+    with ix.writer() as w:
+        w.add_document(kind="class", c_name="Index", docstring="...")
+        w.add_document(kind="method", m_name="add document", parent="Index")
+        w.add_document(kind="method", m_name="add reader", parent="Index")
+        w.add_document(kind="method", m_name="close", parent="Index")
+        
+        w.add_document(kind="class", c_name="Accumulator", docstring="...")
+        w.add_document(kind="method", m_name="add", parent="Accumulator")
+        w.add_document(kind="method", m_name="get result", parent="Accumulator")
+        
+        w.add_document(kind="class", c_name="Calculator", docstring="...")
+        w.add_document(kind="method", m_name="add", parent="Calculator")
+        w.add_document(kind="method", m_name="add all", parent="Calculator")
+        w.add_document(kind="method", m_name="add some", parent="Calculator")
+        w.add_document(kind="method", m_name="multiply", parent="Calculator")
+        w.add_document(kind="method", m_name="close", parent="Calculator")
+        
+        w.add_document(kind="class", c_name="Deleter", docstring="...")
+        w.add_document(kind="method", m_name="add", parent="Deleter")
+        w.add_document(kind="method", m_name="delete", parent="Deleter")
+
+    # Now do manual joins at query time
+    with ix.searcher() as s:
+        # Tip: Searcher.document() and Searcher.documents() let you look up
+        # documents by field values more easily than using Searcher.search()
+    
+        # Children to parents:
+        # Print the docstrings of classes on which "close" methods occur
+        for child_doc in s.documents(m_name="close"):
+            # Use the stored value of the "parent" field to look up the parent
+            # document
+            parent_doc = s.document(c_name=child_doc["parent"])
+            # Print the parent document's stored docstring field
+            print(parent_doc["docstring"])
+        
+        # Parents to children:
+        # Find classes with "big" in the docstring and print their methods
+        q = query.Term("kind", "class") & query.Term("docstring", "big")
+        for hit in s.search(q, limit=None):
+            print("Class name=", hit["c_name"], "methods:")
+            for child_doc in s.documents(parent=hit["c_name"]):
+                print("  Method name=", child_doc["m_name"])
+
+This technique is more flexible than index-time nesting in that you can
+delete/update individual documents in the hierarchy piecemeal, although it
+doesn't support finding different parent levels as easily. It is also slower
+than index-time nesting (potentially much slower), since you must perform
+additional searches for each found document.
+
+Future versions of Whoosh may include "join" queries to make this process more
+efficient (or at least more automatic).
+
+
+
+
+
+
+
+
+
+
+

docs/source/releases/2_0.rst

   objects. This architecture should make it easier to add optional or
   experimental features, and maintain backwards compatibility.
 
-* Fixes issues #137, #206, #213, #215, #219, #223, #226, #230, #233, #238
-  and other bugs.
+* Fixes issues #137, #206, #213, #215, #219, #223, #226, #230, #233, #238,
+  #239, #240, #241, and other bugs.
 
 
 Whoosh 2.3.2

src/whoosh/fields.py

 
     def to_text(self, bit):
         if isinstance(bit, string_type):
-            bit = bit in self.trues
+            bit = bit.lower() in self.trues
         elif not isinstance(bit, bool):
             raise ValueError("%r is not a boolean")
         return self.strings[int(bit)]

src/whoosh/matching/__init__.py

 # those of the authors and should not be interpreted as representing official
 # policies, either expressed or implied, of Matt Chaput.
 
-from whoosh.matching.core import *
+from whoosh.matching.mcore import *
 from whoosh.matching.binary import *
 from whoosh.matching.wrappers import *
 

src/whoosh/matching/binary.py

 # those of the authors and should not be interpreted as representing official
 # policies, either expressed or implied, of Matt Chaput.
 
-from whoosh.matching import core
+from whoosh.matching import mcore
 
 
-class BiMatcher(core.Matcher):
+class BiMatcher(mcore.Matcher):
     """Base class for matchers that combine the results of two sub-matchers in
     some way.
     """
 
     def skip_to(self, id):
         if not self.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
         ra = self.a.skip_to(id)
         rb = self.b.skip_to(id)
         return ra or rb
 
         # If one or both of the sub-matchers are inactive, convert
         if not (a_active or b_active):
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif not a_active:
             return b.replace(minquality)
         elif not b_active:
 
         # Shortcut when one matcher is inactive
         if not (a_active or b_active):
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
         elif not a_active:
             return b.next()
         elif not b_active:
         a = self.a
         b = self.b
         if not (a.is_active() or b.is_active()):
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
 
         # Short circuit if one matcher is inactive
         if not a.is_active():
             if a_max < minquality and b_max < minquality:
                 # If neither sub-matcher has a high enough max quality to
                 # contribute, return an inactive matcher
-                return core.NullMatcher()
+                return mcore.NullMatcher()
             elif b_max < minquality:
                 # If the b matcher can't contribute, return a
                 return a.replace(minquality)
                 return b.replace(minquality)
 
         if not (a_active or b_active):
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif not a_active:
             return b.replace(minquality)
         elif not b_active:
         # again here after we replace them, but it's probably better than
         # returning a replacement with an inactive sub-matcher
         if not (a_active and b_active):
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif not a_active:
             return b
         elif not b_active:
 
         if not (a_active and b_active):
             # Intersection matcher requires that both sub-matchers be active
-            return core.NullMatcher()
+            return mcore.NullMatcher()
 
         if minquality:
             a_max = a.max_quality()
             if a_max + b_max < minquality:
                 # If the combined quality of the sub-matchers can't contribute,
                 # return an inactive matcher
-                return core.NullMatcher()
+                return mcore.NullMatcher()
             # Require that the replacements be able to contribute results
             # higher than the minquality
             a_min = minquality - b_max
         a_active = a.is_active()
         b_active = b.is_active()
         if not (a_active or b_active):
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif not a_active:
             return b
         elif not b_active:
 
     def skip_to(self, id):
         if not self.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
         ra = self.a.skip_to(id)
         rb = self.b.skip_to(id)
         if self.is_active():
 
     def next(self):
         if not self.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
 
         # We must assume that the ids are equal whenever next() is called (they
         # should have been made equal by _find_next), so advance them both
         if not self.a.is_active():
             # The a matcher is required, so if it's inactive, return an
             # inactive matcher
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif (minquality
               and self.a.max_quality() < minquality):
             # If the quality of the required matcher isn't high enough to
             # contribute, return an inactive matcher
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif not self.b.is_active():
             # If the prohibited matcher is inactive, convert to just the
             # required matcher
 
     def next(self):
         if not self.a.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
         ar = self.a.next()
         nr = False
         if self.a.is_active() and self.b.is_active():
 
     def skip_to(self, id):
         if not self.a.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
         if id < self.a.id():
             return
 
 
     def next(self):
         if not self.a.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
 
         ar = self.a.next()
         br = False
 
     def skip_to(self, id):
         if not self.a.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
 
         ra = self.a.skip_to(id)
         rb = False
         b_active = b.is_active()
 
         if not a_active:
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif minquality and b_active:
             if a.max_quality() + b.max_quality() < minquality:
                 # If the combined max quality of the sub-matchers isn't high
                 # enough to possibly contribute, return an inactive matcher
-                return core.NullMatcher()
+                return mcore.NullMatcher()
             elif a.max_quality() < minquality:
                 # If the max quality of the main sub-matcher isn't high enough
                 # to ever contribute without the optional sub- matcher, change
         minquality = minquality
 
         if not a.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
         if not b.is_active():
             return a.skip_to_quality(minquality)
 

src/whoosh/matching/core.py

-# Copyright 2010 Matt Chaput. All rights reserved.
-#
-# Redistribution and use in source and binary forms, with or without
-# modification, are permitted provided that the following conditions are met:
-#
-#    1. Redistributions of source code must retain the above copyright notice,
-#       this list of conditions and the following disclaimer.
-#
-#    2. Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in the
-#       documentation and/or other materials provided with the distribution.
-#
-# THIS SOFTWARE IS PROVIDED BY MATT CHAPUT ``AS IS'' AND ANY EXPRESS OR
-# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
-# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
-# EVENT SHALL MATT CHAPUT OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
-# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
-# OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
-# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-#
-# The views and conclusions contained in the software and documentation are
-# those of the authors and should not be interpreted as representing official
-# policies, either expressed or implied, of Matt Chaput.
-
-"""
-This module contains "matcher" classes. Matchers deal with posting lists. The
-most basic matcher, which reads the list of postings for a term, will be
-provided by the backend implementation (for example,
-:class:`whoosh.filedb.filepostings.FilePostingReader`). The classes in this
-module provide additional functionality, such as combining the results of two
-matchers, or modifying the results of a matcher.
-
-You do not need to deal with the classes in this module unless you need to
-write your own Matcher implementation to provide some new functionality. These
-classes are not instantiated by the user. They are usually created by a
-:class:`~whoosh.query.Query` object's :meth:`~whoosh.query.Query.matcher()`
-method, which returns the appropriate matcher to implement the query (for
-example, the :class:`~whoosh.query.Or` query's
-:meth:`~whoosh.query.Or.matcher()` method returns a
-:py:class:`~whoosh.matching.UnionMatcher` object).
-
-Certain backends support "quality" optimizations. These backends have the
-ability to skip ahead if it knows the current block of postings can't
-contribute to the top N documents. If the matcher tree and backend support
-these optimizations, the matcher's :meth:`Matcher.supports_block_quality()`
-method will return ``True``.
-"""
-
-import sys
-from itertools import repeat
-
-from whoosh.compat import izip, xrange
-from whoosh.util import abstractmethod
-
-
-# Exceptions
-
-class ReadTooFar(Exception):
-    """Raised when :meth:`~whoosh.matching.Matcher.next()` or
-    :meth:`~whoosh.matching.Matcher.skip_to()` are called on an inactive
-    matcher.
-    """
-
-
-class NoQualityAvailable(Exception):
-    """Raised when quality methods are called on a matcher that does not
-    support block quality optimizations.
-    """
-
-
-# Classes
-
-class Matcher(object):
-    """Base class for all matchers.
-    """
-
-    @abstractmethod
-    def is_active(self):
-        """Returns True if this matcher is still "active", that is, it has not
-        yet reached the end of the posting list.
-        """
-
-        raise NotImplementedError
-
-    @abstractmethod
-    def reset(self):
-        """Returns to the start of the posting list.
-
-        Note that reset() may not do what you expect after you call
-        :meth:`Matcher.replace()`, since this can mean calling reset() not on
-        the original matcher, but on an optimized replacement.
-        """
-
-        raise NotImplementedError
-
-    def term(self):
-        """Returns a ``("fieldname", "termtext")`` tuple for the term this
-        matcher matches, or None if this matcher is not a term matcher.
-        """
-
-        return None
-
-    def term_matchers(self):
-        """Returns an iterator of term matchers in this tree.
-        """
-
-        if self.term() is not None:
-            yield self
-        else:
-            for cm in self.children():
-                for m in cm.term_matchers():
-                    yield m
-
-    def matching_terms(self, id=None):
-        """Returns an iterator of ``("fieldname", "termtext")`` tuples for the
-        **currently matching** term matchers in this tree.
-        """
-
-        if not self.is_active():
-            return
-
-        if id is None:
-            id = self.id()
-        elif id != self.id():
-            return
-
-        t = self.term()
-        if t is None:
-            for c in self.children():
-                for t in c.matching_terms(id):
-                    yield t
-        else:
-            yield t
-
-    def children(self):
-        """Returns an (possibly empty) list of the submatchers of this
-        matcher.
-        """
-
-        return []
-
-    def replace(self, minquality=0):
-        """Returns a possibly-simplified version of this matcher. For example,
-        if one of the children of a UnionMatcher is no longer active, calling
-        this method on the UnionMatcher will return the other child.
-        """
-
-        return self
-
-    @abstractmethod
-    def copy(self):
-        """Returns a copy of this matcher.
-        """
-
-        raise NotImplementedError
-
-    def depth(self):
-        """Returns the depth of the tree under this matcher, or 0 if this
-        matcher does not have any children.
-        """
-
-        return 0
-
-    def supports_block_quality(self):
-        """Returns True if this matcher supports the use of ``quality`` and
-        ``block_quality``.
-        """
-
-        return False
-
-    def block_quality(self):
-        """Returns a quality measurement of the current block of postings,
-        according to the current weighting algorithm. Raises
-        ``NoQualityAvailable`` if the matcher or weighting do not support
-        quality measurements.
-        """
-
-        raise NoQualityAvailable(self.__class__)
-
-    @abstractmethod
-    def id(self):
-        """Returns the ID of the current posting.
-        """
-
-        raise NotImplementedError
-
-    def all_ids(self):
-        """Returns a generator of all IDs in the matcher.
-
-        What this method returns for a matcher that has already read some
-        postings (whether it only yields the remaining postings or all postings
-        from the beginning) is undefined, so it's best to only use this method
-        on fresh matchers.
-        """
-
-        i = 0
-        while self.is_active():
-            yield self.id()
-            self.next()
-            i += 1
-            if i == 10:
-                self = self.replace()
-                i = 0
-
-    def all_items(self):
-        """Returns a generator of all (ID, encoded value) pairs in the matcher.
-
-        What this method returns for a matcher that has already read some
-        postings (whether it only yields the remaining postings or all postings
-        from the beginning) is undefined, so it's best to only use this method
-        on fresh matchers.
-        """
-
-        i = 0
-        while self.is_active():
-            yield (self.id(), self.value())
-            self.next()
-            i += 1
-            if i == 10:
-                self = self.replace()
-                i = 0
-
-    def items_as(self, astype):
-        """Returns a generator of all (ID, decoded value) pairs in the matcher.
-
-        What this method returns for a matcher that has already read some
-        postings (whether it only yields the remaining postings or all postings
-        from the beginning) is undefined, so it's best to only use this method
-        on fresh matchers.
-        """
-
-        while self.is_active():
-            yield (self.id(), self.value_as(astype))
-            self.next()
-
-    @abstractmethod
-    def value(self):
-        """Returns the encoded value of the current posting.
-        """
-
-        raise NotImplementedError
-
-    @abstractmethod
-    def supports(self, astype):
-        """Returns True if the field's format supports the named data type,
-        for example 'frequency' or 'characters'.
-        """
-
-        raise NotImplementedError("supports not implemented in %s"
-                                  % self.__class__)
-
-    @abstractmethod
-    def value_as(self, astype):
-        """Returns the value(s) of the current posting as the given type.
-        """
-
-        raise NotImplementedError("value_as not implemented in %s"
-                                  % self.__class__)
-
-    def spans(self):
-        """Returns a list of :class:`whoosh.spans.Span` objects for the matches
-        in this document. Raises an exception if the field being searched does
-        not store positions.
-        """
-
-        from whoosh.spans import Span
-        if self.supports("characters"):
-            return [Span(pos, startchar=startchar, endchar=endchar)
-                    for pos, startchar, endchar in self.value_as("characters")]
-        elif self.supports("positions"):
-            return [Span(pos) for pos in self.value_as("positions")]
-        else:
-            raise Exception("Field does not support spans")
-
-    def skip_to(self, id):
-        """Moves this matcher to the first posting with an ID equal to or
-        greater than the given ID.
-        """
-
-        while self.is_active() and self.id() < id:
-            self.next()
-
-    def skip_to_quality(self, minquality):
-        """Moves this matcher to the next block with greater than the given
-        minimum quality value.
-        """
-
-        raise NotImplementedError(self.__class__.__name__)
-
-    @abstractmethod
-    def next(self):
-        """Moves this matcher to the next posting.
-        """
-
-        raise NotImplementedError(self.__class__.__name__)
-
-    def weight(self):
-        """Returns the weight of the current posting.
-        """
-
-        return self.value_as("weight")
-
-    @abstractmethod
-    def score(self):
-        """Returns the score of the current posting.
-        """
-
-        raise NotImplementedError(self.__class__.__name__)
-
-    def __eq__(self, other):
-        return self.__class__ is type(other)
-
-    def __lt__(self, other):
-        return type(other) is self.__class__
-
-    def __ne__(self, other):
-        return not self.__eq__(other)
-
-    def __gt__(self, other):
-        return not (self.__lt__(other) or self.__eq__(other))
-
-    def __le__(self, other):
-        return self.__eq__(other) or self.__lt__(other)
-
-    def __ge__(self, other):
-        return self.__eq__(other) or self.__gt__(other)
-
-
-class NullMatcherClass(Matcher):
-    """Matcher with no postings which is never active.
-    """
-
-    def __call__(self):
-        return self
-
-    def supports_block_quality(self):
-        return True
-
-    def block_quality(self):
-        return 0
-
-    def skip_to_quality(self, minquality):
-        return 0
-
-    def is_active(self):
-        return False
-
-    def reset(self):
-        pass
-
-    def all_ids(self):
-        return []
-
-    def copy(self):
-        return self
-
-    def max_quality(self):
-        return 0
-
-
-# Singleton instance
-NullMatcher = NullMatcherClass()
-
-
-class ListMatcher(Matcher):
-    """Synthetic matcher backed by a list of IDs.
-    """
-
-    def __init__(self, ids, weights=None, values=None, format=None,
-                 scorer=None, position=0, all_weights=None, term=None,
-                 terminfo=None):
-        """
-        :param ids: a list of doc IDs.
-        :param weights: a list of weights corresponding to the list of IDs.
-            If this argument is not supplied, a list of 1.0 values is used.
-        :param values: a list of encoded values corresponding to the list of
-            IDs.
-        :param format: a :class:`whoosh.formats.Format` object representing the
-            format of the field.
-        :param scorer: a :class:`whoosh.scoring.BaseScorer` object for scoring
-            the postings.
-        :param term: a ``("fieldname", "text")`` tuple, or None if this is not
-            a term matcher.
-        """
-
-        self._ids = ids
-        self._weights = weights
-        self._all_weights = all_weights
-        self._values = values
-        self._i = position
-        self._format = format
-        self._scorer = scorer
-        self._term = term
-        self._terminfo = terminfo
-
-    def __repr__(self):
-        return "<%s>" % self.__class__.__name__
-
-    def is_active(self):
-        return self._i < len(self._ids)
-
-    def reset(self):
-        self._i = 0
-
-    def term(self):
-        return self._term
-
-    def copy(self):
-        return self.__class__(self._ids, self._weights, self._values,
-                              self._format, self._scorer, self._i,
-                              self._all_weights)
-
-    def replace(self, minquality=0):
-        if not self.is_active() or (minquality
-                                    and self.max_quality() < minquality):
-            return NullMatcher()
-        else:
-            return self
-
-    def max_quality(self):
-        return self.block_max_weight()
-
-    def supports_block_quality(self):
-        return (self._scorer is not None
-                and self._scorer.supports_block_quality())
-
-    def block_quality(self):
-        return self._scorer.block_quality(self)
-
-    def skip_to_quality(self, minquality):
-        self._i += 1
-        while self._i < len(self._ids) and self.quality() <= minquality:
-            self._i += 1
-        return 0
-
-    def id(self):
-        return self._ids[self._i]
-
-    def all_ids(self):
-        return iter(self._ids)
-
-    def all_items(self):
-        values = self._values
-        if values is None:
-            values = repeat('')
-
-        return izip(self._ids, values)
-
-    def value(self):
-        if self._values:
-            v = self._values[self._i]
-
-            if isinstance(v, list):
-                # This object supports "values" that are actually lists of
-                # value strings. This is to support combining the results of
-                # several different matchers into a single ListMatcher (see the
-                # TOO_MANY_CLAUSES functionality of MultiTerm). We combine the
-                # values here instead of combining them first and then making
-                # the ListMatcher to avoid wasting time combining values if the
-                # consumer never asks for them.
-                assert len(v) > 0
-                if len(v) == 1:
-                    v = v[0]
-                else:
-                    v = self._format.combine(v)
-                # Replace the list with the computed value string
-                self._values[self._i] = v
-
-            return v
-        else:
-            return ''
-
-    def value_as(self, astype):
-        decoder = self._format.decoder(astype)
-        return decoder(self.value())
-
-    def supports(self, astype):
-        return self._format.supports(astype)
-
-    def next(self):
-        self._i += 1
-
-    def weight(self):
-        if self._all_weights:
-            return self._all_weights
-        elif self._weights:
-            return self._weights[self._i]
-        else:
-            return 1.0
-
-    def block_min_length(self):
-        return self._terminfo.min_length()
-
-    def block_max_length(self):
-        return self._terminfo.max_length()
-
-    def block_max_weight(self):
-        if self._all_weights:
-            return self._all_weights
-        elif self._weights:
-            return max(self._weights)
-        elif self._terminfo is not None:
-            return self._terminfo.max_weight()
-        else:
-            return 1.0
-
-    def score(self):
-        if self._scorer:
-            return self._scorer.score(self)
-        else:
-            return self.weight()
-
-
-
-

src/whoosh/matching/mcore.py

+# Copyright 2010 Matt Chaput. All rights reserved.
+#
+# Redistribution and use in source and binary forms, with or without
+# modification, are permitted provided that the following conditions are met:
+#
+#    1. Redistributions of source code must retain the above copyright notice,
+#       this list of conditions and the following disclaimer.
+#
+#    2. Redistributions in binary form must reproduce the above copyright
+#       notice, this list of conditions and the following disclaimer in the
+#       documentation and/or other materials provided with the distribution.
+#
+# THIS SOFTWARE IS PROVIDED BY MATT CHAPUT ``AS IS'' AND ANY EXPRESS OR
+# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
+# EVENT SHALL MATT CHAPUT OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
+# OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
+# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
+# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
+# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+#
+# The views and conclusions contained in the software and documentation are
+# those of the authors and should not be interpreted as representing official
+# policies, either expressed or implied, of Matt Chaput.
+
+"""
+This module contains "matcher" classes. Matchers deal with posting lists. The
+most basic matcher, which reads the list of postings for a term, will be
+provided by the backend implementation (for example,
+:class:`whoosh.filedb.filepostings.FilePostingReader`). The classes in this
+module provide additional functionality, such as combining the results of two
+matchers, or modifying the results of a matcher.
+
+You do not need to deal with the classes in this module unless you need to
+write your own Matcher implementation to provide some new functionality. These
+classes are not instantiated by the user. They are usually created by a
+:class:`~whoosh.query.Query` object's :meth:`~whoosh.query.Query.matcher()`
+method, which returns the appropriate matcher to implement the query (for
+example, the :class:`~whoosh.query.Or` query's
+:meth:`~whoosh.query.Or.matcher()` method returns a
+:py:class:`~whoosh.matching.UnionMatcher` object).
+
+Certain backends support "quality" optimizations. These backends have the
+ability to skip ahead if it knows the current block of postings can't
+contribute to the top N documents. If the matcher tree and backend support
+these optimizations, the matcher's :meth:`Matcher.supports_block_quality()`
+method will return ``True``.
+"""
+
+import sys
+from itertools import repeat
+
+from whoosh.compat import izip, xrange
+from whoosh.util import abstractmethod
+
+
+# Exceptions
+
+class ReadTooFar(Exception):
+    """Raised when :meth:`~whoosh.matching.Matcher.next()` or
+    :meth:`~whoosh.matching.Matcher.skip_to()` are called on an inactive
+    matcher.
+    """
+
+
+class NoQualityAvailable(Exception):
+    """Raised when quality methods are called on a matcher that does not
+    support block quality optimizations.
+    """
+
+
+# Classes
+
+class Matcher(object):
+    """Base class for all matchers.
+    """
+
+    @abstractmethod
+    def is_active(self):
+        """Returns True if this matcher is still "active", that is, it has not
+        yet reached the end of the posting list.
+        """
+
+        raise NotImplementedError
+
+    @abstractmethod
+    def reset(self):
+        """Returns to the start of the posting list.
+
+        Note that reset() may not do what you expect after you call
+        :meth:`Matcher.replace()`, since this can mean calling reset() not on
+        the original matcher, but on an optimized replacement.
+        """
+
+        raise NotImplementedError
+
+    def term(self):
+        """Returns a ``("fieldname", "termtext")`` tuple for the term this
+        matcher matches, or None if this matcher is not a term matcher.
+        """
+
+        return None
+
+    def term_matchers(self):
+        """Returns an iterator of term matchers in this tree.
+        """
+
+        if self.term() is not None:
+            yield self
+        else:
+            for cm in self.children():
+                for m in cm.term_matchers():
+                    yield m
+
+    def matching_terms(self, id=None):
+        """Returns an iterator of ``("fieldname", "termtext")`` tuples for the
+        **currently matching** term matchers in this tree.
+        """
+
+        if not self.is_active():
+            return
+
+        if id is None:
+            id = self.id()
+        elif id != self.id():
+            return
+
+        t = self.term()
+        if t is None:
+            for c in self.children():
+                for t in c.matching_terms(id):
+                    yield t
+        else:
+            yield t
+
+    def children(self):
+        """Returns an (possibly empty) list of the submatchers of this
+        matcher.
+        """
+
+        return []
+
+    def replace(self, minquality=0):
+        """Returns a possibly-simplified version of this matcher. For example,
+        if one of the children of a UnionMatcher is no longer active, calling
+        this method on the UnionMatcher will return the other child.
+        """
+
+        return self
+
+    @abstractmethod
+    def copy(self):
+        """Returns a copy of this matcher.
+        """
+
+        raise NotImplementedError
+
+    def depth(self):
+        """Returns the depth of the tree under this matcher, or 0 if this
+        matcher does not have any children.
+        """
+
+        return 0
+
+    def supports_block_quality(self):
+        """Returns True if this matcher supports the use of ``quality`` and
+        ``block_quality``.
+        """
+
+        return False
+
+    def block_quality(self):
+        """Returns a quality measurement of the current block of postings,
+        according to the current weighting algorithm. Raises
+        ``NoQualityAvailable`` if the matcher or weighting do not support
+        quality measurements.
+        """
+
+        raise NoQualityAvailable(self.__class__)
+
+    @abstractmethod
+    def id(self):
+        """Returns the ID of the current posting.
+        """
+
+        raise NotImplementedError
+
+    def all_ids(self):
+        """Returns a generator of all IDs in the matcher.
+
+        What this method returns for a matcher that has already read some
+        postings (whether it only yields the remaining postings or all postings
+        from the beginning) is undefined, so it's best to only use this method
+        on fresh matchers.
+        """
+
+        i = 0
+        while self.is_active():
+            yield self.id()
+            self.next()
+            i += 1
+            if i == 10:
+                self = self.replace()
+                i = 0
+
+    def all_items(self):
+        """Returns a generator of all (ID, encoded value) pairs in the matcher.
+
+        What this method returns for a matcher that has already read some
+        postings (whether it only yields the remaining postings or all postings
+        from the beginning) is undefined, so it's best to only use this method
+        on fresh matchers.
+        """
+
+        i = 0
+        while self.is_active():
+            yield (self.id(), self.value())
+            self.next()
+            i += 1
+            if i == 10:
+                self = self.replace()
+                i = 0
+
+    def items_as(self, astype):
+        """Returns a generator of all (ID, decoded value) pairs in the matcher.
+
+        What this method returns for a matcher that has already read some
+        postings (whether it only yields the remaining postings or all postings
+        from the beginning) is undefined, so it's best to only use this method
+        on fresh matchers.
+        """
+
+        while self.is_active():
+            yield (self.id(), self.value_as(astype))
+            self.next()
+
+    @abstractmethod
+    def value(self):
+        """Returns the encoded value of the current posting.
+        """
+
+        raise NotImplementedError
+
+    @abstractmethod
+    def supports(self, astype):
+        """Returns True if the field's format supports the named data type,
+        for example 'frequency' or 'characters'.
+        """
+
+        raise NotImplementedError("supports not implemented in %s"
+                                  % self.__class__)
+
+    @abstractmethod
+    def value_as(self, astype):
+        """Returns the value(s) of the current posting as the given type.
+        """
+
+        raise NotImplementedError("value_as not implemented in %s"
+                                  % self.__class__)
+
+    def spans(self):
+        """Returns a list of :class:`whoosh.spans.Span` objects for the matches
+        in this document. Raises an exception if the field being searched does
+        not store positions.
+        """
+
+        from whoosh.spans import Span
+        if self.supports("characters"):
+            return [Span(pos, startchar=startchar, endchar=endchar)
+                    for pos, startchar, endchar in self.value_as("characters")]
+        elif self.supports("positions"):
+            return [Span(pos) for pos in self.value_as("positions")]
+        else:
+            raise Exception("Field does not support spans")
+
+    def skip_to(self, id):
+        """Moves this matcher to the first posting with an ID equal to or
+        greater than the given ID.
+        """
+
+        while self.is_active() and self.id() < id:
+            self.next()
+
+    def skip_to_quality(self, minquality):
+        """Moves this matcher to the next block with greater than the given
+        minimum quality value.
+        """
+
+        raise NotImplementedError(self.__class__.__name__)
+
+    @abstractmethod
+    def next(self):
+        """Moves this matcher to the next posting.
+        """
+
+        raise NotImplementedError(self.__class__.__name__)
+
+    def weight(self):
+        """Returns the weight of the current posting.
+        """
+
+        return self.value_as("weight")
+
+    @abstractmethod
+    def score(self):
+        """Returns the score of the current posting.
+        """
+
+        raise NotImplementedError(self.__class__.__name__)
+
+    def __eq__(self, other):
+        return self.__class__ is type(other)
+
+    def __lt__(self, other):
+        return type(other) is self.__class__
+
+    def __ne__(self, other):
+        return not self.__eq__(other)
+
+    def __gt__(self, other):
+        return not (self.__lt__(other) or self.__eq__(other))
+
+    def __le__(self, other):
+        return self.__eq__(other) or self.__lt__(other)
+
+    def __ge__(self, other):
+        return self.__eq__(other) or self.__gt__(other)
+
+
+class NullMatcherClass(Matcher):
+    """Matcher with no postings which is never active.
+    """
+
+    def __call__(self):
+        return self
+
+    def supports_block_quality(self):
+        return True
+
+    def block_quality(self):
+        return 0
+
+    def skip_to_quality(self, minquality):
+        return 0
+
+    def is_active(self):
+        return False
+
+    def reset(self):
+        pass
+
+    def all_ids(self):
+        return []
+
+    def copy(self):
+        return self
+
+    def max_quality(self):
+        return 0
+
+
+# Singleton instance
+NullMatcher = NullMatcherClass()
+
+
+class ListMatcher(Matcher):
+    """Synthetic matcher backed by a list of IDs.
+    """
+
+    def __init__(self, ids, weights=None, values=None, format=None,
+                 scorer=None, position=0, all_weights=None, term=None,
+                 terminfo=None):
+        """
+        :param ids: a list of doc IDs.
+        :param weights: a list of weights corresponding to the list of IDs.
+            If this argument is not supplied, a list of 1.0 values is used.
+        :param values: a list of encoded values corresponding to the list of
+            IDs.
+        :param format: a :class:`whoosh.formats.Format` object representing the
+            format of the field.
+        :param scorer: a :class:`whoosh.scoring.BaseScorer` object for scoring
+            the postings.
+        :param term: a ``("fieldname", "text")`` tuple, or None if this is not
+            a term matcher.
+        """
+
+        self._ids = ids
+        self._weights = weights
+        self._all_weights = all_weights
+        self._values = values
+        self._i = position
+        self._format = format
+        self._scorer = scorer
+        self._term = term
+        self._terminfo = terminfo
+
+    def __repr__(self):
+        return "<%s>" % self.__class__.__name__
+
+    def is_active(self):
+        return self._i < len(self._ids)
+
+    def reset(self):
+        self._i = 0
+
+    def term(self):
+        return self._term
+
+    def copy(self):
+        return self.__class__(self._ids, self._weights, self._values,
+                              self._format, self._scorer, self._i,
+                              self._all_weights)
+
+    def replace(self, minquality=0):
+        if not self.is_active() or (minquality
+                                    and self.max_quality() < minquality):
+            return NullMatcher()
+        else:
+            return self
+
+    def max_quality(self):
+        return self.block_max_weight()
+
+    def supports_block_quality(self):
+        return (self._scorer is not None
+                and self._scorer.supports_block_quality())
+
+    def block_quality(self):
+        return self._scorer.block_quality(self)
+
+    def skip_to_quality(self, minquality):
+        self._i += 1
+        while self._i < len(self._ids) and self.quality() <= minquality:
+            self._i += 1
+        return 0
+
+    def id(self):
+        return self._ids[self._i]
+
+    def all_ids(self):
+        return iter(self._ids)
+
+    def all_items(self):
+        values = self._values
+        if values is None:
+            values = repeat('')
+
+        return izip(self._ids, values)
+
+    def value(self):
+        if self._values:
+            v = self._values[self._i]
+
+            if isinstance(v, list):
+                # This object supports "values" that are actually lists of
+                # value strings. This is to support combining the results of
+                # several different matchers into a single ListMatcher (see the
+                # TOO_MANY_CLAUSES functionality of MultiTerm). We combine the
+                # values here instead of combining them first and then making
+                # the ListMatcher to avoid wasting time combining values if the
+                # consumer never asks for them.
+                assert len(v) > 0
+                if len(v) == 1:
+                    v = v[0]
+                else:
+                    v = self._format.combine(v)
+                # Replace the list with the computed value string
+                self._values[self._i] = v
+
+            return v
+        else:
+            return ''
+
+    def value_as(self, astype):
+        decoder = self._format.decoder(astype)
+        return decoder(self.value())
+
+    def supports(self, astype):
+        return self._format.supports(astype)
+
+    def next(self):
+        self._i += 1
+
+    def weight(self):
+        if self._all_weights:
+            return self._all_weights
+        elif self._weights:
+            return self._weights[self._i]
+        else:
+            return 1.0
+
+    def block_min_length(self):
+        return self._terminfo.min_length()
+
+    def block_max_length(self):
+        return self._terminfo.max_length()
+
+    def block_max_weight(self):
+        if self._all_weights:
+            return self._all_weights
+        elif self._weights:
+            return max(self._weights)
+        elif self._terminfo is not None:
+            return self._terminfo.max_weight()
+        else:
+            return 1.0
+
+    def score(self):
+        if self._scorer:
+            return self._scorer.score(self)
+        else:
+            return self.weight()
+
+
+
+

src/whoosh/matching/wrappers.py

 
 import sys
 from whoosh.compat import xrange
-from whoosh.matching import core
+from whoosh.matching import mcore
 
 
-class WrappingMatcher(core.Matcher):
+class WrappingMatcher(mcore.Matcher):
     """Base class for matchers that wrap sub-matchers.
     """
 
         r = self.child.replace(minquality)
         if not r.is_active():
             # If the replaced child is inactive, return an inactive matcher
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif r is not self.child:
             # If the child changed, return a new wrapper on the new child
             return self._replacement(r)
         return self.child.score() * self.boost
 
 
-class MultiMatcher(core.Matcher):
+class MultiMatcher(mcore.Matcher):
     """Serializes the results of a list of sub-matchers.
     """
 
                 m._next_matcher()
 
         if not m.is_active():
-            return core.NullMatcher()
+            return mcore.NullMatcher()
 
         # TODO: Possible optimization: if the last matcher is current, replace
         # this with the last matcher, but wrap it with a matcher that adds the
 
     def next(self):
         if not self.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
 
         self.matchers[self.current].next()
         if not self.matchers[self.current].is_active():
 
     def skip_to(self, id):
         if not self.is_active():
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
         if id <= self.id():
             return
 
 
     def next(self):
         if self._id >= self.limit:
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
         self._id += 1
         self._find_next()
 
     def skip_to(self, id):
         if self._id >= self.limit:
-            raise core.ReadTooFar
+            raise mcore.ReadTooFar
         if id < self._id:
             return
         self._id = id
     def replace(self, minquality=0):
         if not self.child.is_active():
             # If one of the sub-matchers is inactive, go inactive
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif minquality and self.a.max_quality() < minquality:
             # If the required matcher doesn't have a high enough max quality
             # to possibly contribute, return an inactive matcher
-            return core.NullMatcher()
+            return mcore.NullMatcher()
 
         new_a = self.a.replace(minquality)
         new_b = self.b.replace()
         if not new_a.is_active():
-            return core.NullMatcher()
+            return mcore.NullMatcher()
         elif new_a is not self.a or new_b is not self.b:
             # If one of the sub-matchers changed, return a new Require
             return self.__class__(new_a, self.b)

src/whoosh/query/__init__.py

 # those of the authors and should not be interpreted as representing official
 # policies, either expressed or implied, of Matt Chaput.
 
-from whoosh.query.core import *
+from whoosh.query.qcore import *
 from whoosh.query.terms import *
 from whoosh.query.nary import *
 from whoosh.query.positional import *
 from whoosh.query.ranges import *
 from whoosh.query.wrappers import *
+from whoosh.query.nested import *
 

src/whoosh/query/core.py

-# Copyright 2007 Matt Chaput. All rights reserved.
-#
-# Redistribution and use in source and binary forms, with or without
-# modification, are permitted provided that the following conditions are met:
-#
-#    1. Redistributions of source code must retain the above copyright notice,
-#       this list of conditions and the following disclaimer.
-#
-#    2. Redistributions in binary form must reproduce the above copyright
-#       notice, this list of conditions and the following disclaimer in the
-#       documentation and/or other materials provided with the distribution.
-#
-# THIS SOFTWARE IS PROVIDED BY MATT CHAPUT ``AS IS'' AND ANY EXPRESS OR
-# IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
-# MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
-# EVENT SHALL MATT CHAPUT OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
-# INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
-# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA,
-# OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
-# LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
-# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
-# EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-#
-# The views and conclusions contained in the software and documentation are
-# those of the authors and should not be interpreted as representing official
-# policies, either expressed or implied, of Matt Chaput.
-
-from __future__ import division
-import copy
-from array import array
-
-from whoosh import matching
-from whoosh.compat import u
-from whoosh.reading import TermNotFound
-from whoosh.util import methodcaller
-
-
-# Exceptions
-
-class QueryError(Exception):
-    """Error encountered while running a query.
-    """
-    pass
-
-
-# Functions
-
-def error_query(msg, q=None):
-    """Returns the query in the second argument (or a :class:`NullQuery` if the
-    second argument is not given) with its ``error`` attribute set to
-    ``msg``.
-    """
-
-    if q is None:
-        q = _NullQuery()
-    q.error = msg
-    return q
-
-
-def token_lists(q, phrases=True):
-    """Returns the terms in the query tree, with the query hierarchy
-    represented as nested lists.
-    """
-
-    if q.is_leaf():
-        from whoosh.query import Phrase
-        if phrases or not isinstance(q, Phrase):
-            return list(q.tokens())
-    else:
-        ls = []
-        for qq in q.children():
-            t = token_lists(qq, phrases=phrases)
-            if len(t) == 1:
-                t = t[0]
-            if t:
-                ls.append(t)
-        return ls
-
-
-# Utility classes
-
-class Lowest(object):
-    """A value that is always compares lower than any other object except
-    itself.
-    """
-
-    def __cmp__(self, other):
-        if other.__class__ is Lowest:
-            return 0
-        return -1
-
-    def __eq__(self, other):
-        return self.__class__ is type(other)
-
-    def __lt__(self, other):
-        return type(other) is not self.__class__
-
-    def __ne__(self, other):
-        return not self.__eq__(other)
-
-    def __gt__(self, other):
-        return not (self.__lt__(other) or self.__eq__(other))
-
-    def __le__(self, other):
-        return self.__eq__(other) or self.__lt__(other)
-
-    def __ge__(self, other):
-        return self.__eq__(other) or self.__gt__(other)
-
-
-class Highest(object):
-    """A value that is always compares higher than any other object except
-    itself.
-    """
-
-    def __cmp__(self, other):
-        if other.__class__ is Highest:
-            return 0
-        return 1
-
-    def __eq__(self, other):
-        return self.__class__ is type(other)
-
-    def __lt__(self, other):
-        return type(other) is self.__class__
-
-    def __ne__(self, other):
-        return not self.__eq__(other)
-
-    def __gt__(self, other):
-        return not (self.__lt__(other) or self.__eq__(other))
-
-    def __le__(self, other):
-        return self.__eq__(other) or self.__lt__(other)
-
-    def __ge__(self, other):
-        return self.__eq__(other) or self.__gt__(other)
-
-
-Lowest = Lowest()
-Highest = Highest()
-
-
-# Base classes
-
-class Query(object):
-    """Abstract base class for all queries.
-
-    Note that this base class implements __or__, __and__, and __sub__ to allow
-    slightly more convenient composition of query objects::
-
-        >>> Term("content", u"a") | Term("content", u"b")
-        Or([Term("content", u"a"), Term("content", u"b")])
-
-        >>> Term("content", u"a") & Term("content", u"b")
-        And([Term("content", u"a"), Term("content", u"b")])
-
-        >>> Term("content", u"a") - Term("content", u"b")
-        And([Term("content", u"a"), Not(Term("content", u"b"))])
-    """
-
-    # For queries produced by the query parser, record where in the user
-    # query this object originated
-    startchar = endchar = None
-    # For queries produced by the query parser, records an error that resulted
-    # in this query
-    error = None
-
-    def __or__(self, query):
-        """Allows you to use | between query objects to wrap them in an Or
-        query.
-        """
-
-        from whoosh.query import Or
-        return Or([self, query]).normalize()
-
-    def __and__(self, query):
-        """Allows you to use & between query objects to wrap them in an And
-        query.
-        """
-
-        from whoosh.query import And
-        return And([self, query]).normalize()
-
-    def __sub__(self, query):
-        """Allows you to use - between query objects to add the right-hand
-        query as a "NOT" query.
-        """
-
-        from whoosh.query import And, Not
-        return And([self, Not(query)]).normalize()
-
-    def __hash__(self):
-        raise NotImplementedError
-
-    def __ne__(self, other):
-        return not self.__eq__(other)
-
-    def is_leaf(self):
-        """Returns True if this is a leaf node in the query tree, or False if
-        this query has sub-queries.
-        """
-
-        return True
-
-    def children(self):
-        """Returns an iterator of the subqueries of this object.
-        """
-
-        return iter([])
-
-    def is_range(self):
-        """Returns True if this object searches for values within a range.
-        """
-
-        return False
-
-    def has_terms(self):
-        """Returns True if this specific object represents a search for a
-        specific term (as opposed to a pattern, as in Wildcard and Prefix) or
-        terms (i.e., whether the ``replace()`` method does something
-        meaningful on this instance).
-        """
-
-        return False
-
-    def apply(self, fn):
-        """If this query has children, calls the given function on each child
-        and returns a new copy of this node with the new children returned by
-        the function. If this is a leaf node, simply returns this object.
-
-        This is useful for writing functions that transform a query tree. For
-        example, this function changes all Term objects in a query tree into
-        Variations objects::
-
-            def term2var(q):
-                if isinstance(q, Term):
-                    return Variations(q.fieldname, q.text)
-                else:
-                    return q.apply(term2var)
-
-            q = And([Term("f", "alfa"),
-                     Or([Term("f", "bravo"),
-                         Not(Term("f", "charlie"))])])
-            q = term2var(q)
-
-        Note that this method does not automatically create copies of nodes.
-        To avoid modifying the original tree, your function should call the
-        :meth:`Query.copy` method on nodes before changing their attributes.
-        """
-
-        return self
-
-    def accept(self, fn):
-        """Applies the given function to this query's subqueries (if any) and
-        then to this query itself::
-
-            def boost_phrases(q):
-                if isintance(q, Phrase):
-                    q.boost *= 2.0
-                return q
-
-            myquery = myquery.accept(boost_phrases)
-
-        This method automatically creates copies of the nodes in the original
-        tree before passing them to your function, so your function can change
-        attributes on nodes without altering the original tree.
-
-        This method is less flexible than using :meth:`Query.apply` (in fact
-        it's implemented using that method) but is often more straightforward.
-        """
-
-        def fn_wrapper(q):
-            q = q.apply(fn_wrapper)
-            return fn(q)
-
-        return fn_wrapper(self)
-
-    def replace(self, fieldname, oldtext, newtext):
-        """Returns a copy of this query with oldtext replaced by newtext (if
-        oldtext was anywhere in this query).
-
-        Note that this returns a *new* query with the given text replaced. It
-        *does not* modify the original query "in place".
-        """
-
-        # The default implementation uses the apply method to "pass down" the
-        # replace() method call
-        if self.is_leaf():
-            return copy.copy(self)
-        else:
-            return self.apply(methodcaller("replace", fieldname, oldtext,
-                                           newtext))
-
-    def copy(self):
-        """Deprecated, just use ``copy.deepcopy``.
-        """
-
-        return copy.deepcopy(self)
-
-    def all_terms(self, termset=None, phrases=True):
-        """Returns a set of all terms in this query tree.
-
-        This method exists for backwards compatibility. For more flexibility
-        use the :meth:`Query.iter_all_terms` method instead, which simply
-        yields the terms in the query.
-
-        :param phrases: Whether to add words found in Phrase queries.
-        :rtype: set
-        """
-
-        from whoosh.query import Phrase
-
-        if not termset:
-            termset = set()
-        for q in self.leaves():
-            if q.has_terms():
-                if phrases or not isinstance(q, Phrase):
-                    termset.update(q.terms())
-        return termset
-
-    def _existing_terms_helper(self, ixreader, termset, reverse):
-        if termset is None:
-            termset = set()
-        if reverse:
-            test = lambda t: t not in ixreader
-        else:
-            test = lambda t: t in ixreader
-
-        return termset, test
-
-    def existing_terms(self, ixreader, termset=None, reverse=False,
-                       phrases=True, expand=False):
-        """Returns a set of all terms in this query tree that exist in the
-        given ixreaderder.
-
-        This method exists for backwards compatibility. For more flexibility
-        use the :meth:`Query.iter_all_terms` method instead, which simply
-        yields the terms in the query.
-
-        :param ixreader: A :class:`whoosh.reading.IndexReader` object.
-        :param reverse: If True, this method adds *missing* terms rather than
-            *existing* terms to the set.
-        :param phrases: Whether to add words found in Phrase queries.
-        :param expand: If True, queries that match multiple terms
-            (such as :class:`Wildcard` and :class:`Prefix`) will return all
-            matching expansions.
-        :rtype: set
-        """
-
-        # By default, this method calls all_terms() and then filters based on
-        # the contents of the reader. Subclasses that need to use the reader to
-        # generate the terms (i.e. MultiTerm) need to override this
-        # implementation
-
-        termset, test = self._existing_terms_helper(ixreader, termset, reverse)
-        if self.is_leaf():
-            gen = self.all_terms(phrases=phrases)
-            termset.update(t for t in gen if test(t))
-        else:
-            for q in self.children():
-                q.existing_terms(ixreader, termset, reverse, phrases, expand)
-        return termset
-
-    def leaves(self):
-        """Returns an iterator of all the leaf queries in this query tree as a
-        flat series.
-        """
-
-        if self.is_leaf():
-            yield self
-        else:
-            for q in self.children():
-                for qq in q.leaves():
-                    yield qq
-
-    def iter_all_terms(self):
-        """Returns an iterator of ("fieldname", "text") pairs for all terms in
-        this query tree.
-
-        >>> qp = qparser.QueryParser("text", myindex.schema)
-        >>> q = myparser.parse("alfa bravo title:charlie")
-        >>> # List the terms in a query
-        >>> list(q.iter_all_terms())
-        [("text", "alfa"), ("text", "bravo"), ("title", "charlie")]
-        >>> # Get a set of all terms in the query that don't exist in the index
-        >>> r = myindex.reader()
-        >>> missing = set(t for t in q.iter_all_terms() if t not in r)
-        set([("text", "alfa"), ("title", "charlie")])
-        >>> # All terms in the query that occur in fewer than 5 documents in
-        >>> # the index
-        >>> [t for t in q.iter_all_terms() if r.doc_frequency(t[0], t[1]) < 5]
-        [("title", "charlie")]
-        """
-
-        for q in self.leaves():
-            if q.has_terms():
-                for t in q.terms():
-                    yield t
-
-    def all_tokens(self, boost=1.0):
-        """Returns an iterator of :class:`analysis.Token` objects corresponding
-        to all terms in this query tree. The Token objects will have the
-        ``fieldname``, ``text``, and ``boost`` attributes set. If the query
-        was built by the query parser, they Token objects will also have