1. Matt Chaput
  2. whoosh


Matt Chaput  committed 96433ea

Added documentation and updated release notes for Whoosh 2.5 release.

  • Participants
  • Parent commits f3feb2a
  • Branches default

Comments (0)

Files changed (5)

File docs/source/api/analysis.rst

View file
 .. autoclass:: FancyAnalyzer
 .. autoclass:: NgramAnalyzer
 .. autoclass:: NgramWordAnalyzer
+.. autoclass:: LanguageAnalyzer

File docs/source/releases/2_0.rst

View file
 Whoosh 2.x release notes
+Whoosh 2.5
+* Whoosh 2.5 will read existing indexes, but segments created by 2.5 will not
+  be readable by older versions of Whoosh.
+* You should now specify ``sortable=True`` on fields that you plan on using to
+  sort search results.
+  Note that you can still sort on fields without specifying ``sortable=True``,
+  however the first sort will be slow as Whoosh caches a column in memory.
+  Instead of using field caches to speed up sorting, Whoosh now supports adding
+  a ``sortable=True`` keyword argument to fields. This makes Whoosh store a
+  sortable representation of the field's values in a "column" format
+  (associating a "key" value with each document). This is more robust,
+  efficient, and customizable than the old behavior.
+  Fields that use ``sortable=True`` can avoid specifying ``stored=True`` and the
+  field's value will still be available on ``Hit`` objects (the value will be
+  retrieved from the column instead of from the stored fields). This may
+  actually be faster for certain types of values.
+* Whoosh will now detect common types of OR queries and use optimized read-ahead
+  matchers to speed them up by several times.
+* Whoosh now includes pure-python implementations of the Snowball stemmers and
+  stop word lists for various languages adapted from NLTK. These are available
+  through the :class:`whoosh.analysis.LanguageAnalyzer` analyzer or through the
+  ``lang=`` keyword argument to the
+  :class:`~whoosh.fields.TEXT` field.
+* You can now use the
+  :meth:`whoosh.filedb.filestore.Storage.create()` and
+  :meth:`whoosh.filedb.filestore.Storage.destory()`
+  methods as a consistent API to set up and tear down different types of
+  storage.
+* Many bug fixes and speed improvements.
+* Switched unit tests to use ``py.test`` instead of ``nose``.
+* Removed obsolete ``SpellChecker`` class.
 Whoosh 2.4

File src/whoosh/columns.py

View file
 # those of the authors and should not be interpreted as representing official
 # policies, either expressed or implied, of Matt Chaput.
+The API and implementation of columns may change in the next version of Whoosh!
+This module contains "Column" objects which you can use as the argument to a
+Field object's ``sortable=`` keyword argument. Each field defines a default
+column type for when the user specifies ``sortable=True`` (the object returned
+by the field's ``default_column()`` method).
+The default column type for most fields is ``VarBytesColumn``,
+although numeric and date fields use ``NumericColumn``. Expert users may use
+other field types that may be faster or more storage efficient based on the
+field contents. For example, if a field always contains one of a limited number
+of possible values, a ``RefBytesColumn`` will save space by only storing the
+values once. If a field's values are always a fixed length, the
+``FixedBytesColumn`` saves space by not storing the length of each value.
+A ``Column`` object basically exists to store configuration information and
+provides two important methods: ``writer()`` to return a ``ColumnWriter`` object
+and ``reader()`` to return a ``ColumnReader`` object.
 from __future__ import division, with_statement
 import struct, warnings
 from array import array

File tests/test_highlighting.py

View file
         r.formatter = highlight.UppercaseFormatter()
         snippet = r[0].highlights("text")
         assert snippet == "MULTIPLICATIon and subtracTION are good"
+def test_issue324():
+    sa = analysis.StemmingAnalyzer()
+    result = highlight.highlight(u("Indexed!\n1"), [u("index")], sa,
+                                 fragmenter=highlight.ContextFragmenter(),
+                                 formatter=highlight.UppercaseFormatter())
+    assert result == "INDEXED!"

File tests/test_queries.py

View file
             r = " ".join([hit["title"] for hit in s.search(q)])
             assert r == "a1 a2 b1"
+def test_ornot_andnot():
+    schema = fields.Schema(id=fields.NUMERIC(stored=True), a=fields.KEYWORD())
+    st = RamStorage()
+    ix = st.create_index(schema)
+    with ix.writer() as w:
+        w.add_document(id=0, a=u("word1 word1"))
+        w.add_document(id=1, a=u("word1 word2"))
+        w.add_document(id=2, a=u("word1 foo"))
+        w.add_document(id=3, a=u("foo word2"))
+        w.add_document(id=4, a=u("foo bar"))
+    with ix.searcher() as s:
+        qp = qparser.QueryParser("a", ix.schema)
+        q1 = qp.parse(u("NOT word1 NOT word2"))
+        q2 = qp.parse(u("NOT (word1 OR word2)"))
+        r1 = [hit["id"] for hit in s.search(q1, sortedby="id")]
+        r2 = [hit["id"] for hit in s.search(q2, sortedby="id")]
+        assert r1 == r2 == [4]