Commits

Matt Chaput  committed 8d797df

Documented new datetime support. Other minor docs improvements. Bumped version 1.3.

  • Participants
  • Parent commits 6ce9c17

Comments (0)

Files changed (6)

File docs/source/dates.rst

+================================
+Indexing and parsing dates/times
+================================
+
+Indexing dates
+==============
+
+Whoosh lets you index and search dates/times using the
+:class:`whoosh.fields.DATETIME` field type. Instead of passing text for the
+field in ``add_document()``, you use a Python ``datetime.datetime`` object::
+
+    from datetime import datetime, timedelta from whoosh import fields, index
+    
+    schema = fields.Schema(title=fields.TEXT, content=fields.TEXT,
+                           date=fields.DATETIME)
+    ix = index.create_in("indexdir", schema)
+    
+    w = ix.writer()
+    w.add_document(title="Document 1", content="Rendering images from the command line",
+                   date=datetime.utcnow())
+    w.add_document(title="Document 2", content="Creating shaders using a node network",
+                   date=datetime.utcnow() + timedelta(days=1))
+    w.commit()
+
+
+Parsing date queries
+====================
+
+Once you've have an indexed ``DATETIME`` field, you can search it using a rich
+date parser contained in the :class:`whoosh.qparser.dateparse.DateParserPlugin`::
+
+    from whoosh import index
+    from whoosh.qparser import QueryParser
+    from whoosh.qparser.dateparse import DateParserPlugin
+    
+    ix = index.open_dir("indexdir")
+    
+    # Instatiate a query parser
+    qp = QueryParser("content", ix.schema)
+    
+    # Add the DateParserPlugin to the parser
+    qp.add_plugin(DateParserPlugin())
+    
+With the ``DateParserPlugin``, users can use date queries such as::
+
+    20050912
+    2005 sept 12th
+    june 23 1978
+    23 mar 2005
+    july 1985
+    sep 12
+    today
+    yesterday
+    tomorrow
+    now
+    next friday
+    last tuesday
+    5am
+    10:25:54
+    23:12
+    8 PM
+    4:46 am oct 31 2010
+    last tuesday to today
+    today to next friday
+    jan 2005 to feb 2008
+    -1 week to now
+    now to +2h
+    -1y6mo to +2 yrs 23d
+
+Normally, as with other types of queries containing spaces, the users needs
+to quote date queries containing spaces using single quotes::
+
+    render date:'last tuesday' command
+
+If you use the ``free`` argument to the DateParserPlugin, the plugin will
+try to parse dates from unquoted text following a date field prefix
+
+    qp.add_plugin(DateParserPlugin(free=True))
+
+This allows the user to type a date query with spaces and special characters
+following the name of date filed and a colon. The date query can be mixed
+with other types of queries without quotes::
+
+    date:last tuesday
+    render date:oct 15th 2001 5:20am command
+
+If you don't use the ``DateParserPlugin``, users can still search DATETIME
+fields using a simple numeric form ``YYYY[MM[DD[hh[mm[ss]]]]]`` that is built
+into the ``DATETIME`` field::
+
+    from whoosh import index
+    from whoosh.qparser import QueryParser
+    
+    ix = index.open_dir("indexdir")
+    qp = QueryParser("content", schema=ix.schema)
+    
+    # Find all datetimes in 2005
+    q = qp.parse(u"date:2005")
+    
+    # Find all datetimes on June 24, 2005
+    q = qp.parse(u"date:20050624")
+    
+    # Find all datetimes from 1am-2am on June 24, 2005
+    q = qp.parse(u"date:2005062401")
+    
+    # Find all datetimes from Jan 1, 2005 to June 2, 2010
+    q = qp.parse(u"date:[20050101 to 20100602]")
+
+
+About time zones and basetime
+=============================
+
+The best way to deal with time zones is to always index ``datetime``s in naive
+UTC form. Any ``tzinfo`` attribute on the ``datetime`` object is _ignored_
+by the indexer. If you are working with local datetimes, you should convert them
+to naive UTC datetimes before indexing.
+
+
+Date parser notes
+=================
+
+Please note that the date parser is still somewhat experimental.
+
+
+Setting the base datetime
+-------------------------
+
+When you create the ``DateParserPlugin`` you can pass a ``datetime`` object to
+the ``basedate`` argument to set the datetime against which relative queries
+(such as ``last tuesday`` and ``-2 hours``) are measured. By default, the
+basedate is ``datetime.utcnow()`` at the moment the plugin is instantiated::
+
+    qp.add_plugin(DateParserPlugin(basedate=my_datetime))
+
+
+Registering an error callback
+-----------------------------
+
+To avoid user queries causing exceptions in your application, the date parser
+attempts to fail silently when it can't parse a date query. However, you can
+register a callback function to be notified of parsing failures so you can
+display feedback to the user. The argument to the callback function is the
+date text that could not be parsed (this is an experimental feature and may
+change in future versions)::
+
+    errors = []
+    def add_error(msg):
+        errors.append(msg)
+    qp.add_plugin(DateParserPlug(callback=add_error))
+    
+    q = qp.parse(u"date:blarg")
+    # errors == [u"blarg"]
+
+
+Using free parsing
+------------------
+
+While the ``free`` option is easier for users, it may result in ambiguities.
+As one example, if you want to find documents containing reference to a march
+and the number 2 in documents from the year 2005, you might type::
+
+    date:2005 march 2
+
+This query would be interpreted correctly as a date query and two term queries
+when ``free=False``, but as a single date query when ``free=True``. In this
+case the user could limit the scope of the date parser with single quotes::
+
+    date:'2005' march 2
+
+
+Parsable formats
+----------------
+
+The date parser supports a wide array of date and time formats, however it is
+not my intention to try to support *all* types of human-readable dates (for
+example ``ten to five the friday after next``). The best idea might be to pick
+a date format that works and try to train users on it, and if they use one of
+the other formats that also works consider it a happy accident.
+
+
+Limitations
+===========
+
+* Since it's based on Python's ``datetime.datetime`` object, the ``DATETIME``
+  field shares all the limitations of that class, such as no support for
+  dates before year 1 on the proleptic Gregorian calendar. The ``DATETIME``
+  field supports practically unlimited dates, so if the ``datetime`` object
+  is every improved it could support it. An alternative possibility might
+  be to add support for mxDateTime objects someday.
+
+* The ``DateParserPlugin`` currently only has support for English dates.
+  The architecture supports creation of parsers for other languages, and I
+  hope to add examples for other languages soon.
+
+* ``DATETIME`` fields do not currently support open-ended ranges. You can
+  simulate an open ended range by using an endpoint far in the past or future.
+
+
+
+

File docs/source/index.rst

     searching
     parsing
     querylang
+    dates
     query
     analysis
     stemming

File docs/source/ngrams.rst

 u'erin', u'rin', u'ring', u'ing', u'sha', u'shad', u'had', u'hade', u'ade',
 u'ader', u'der', u'ders', u'ers']
 
+TBD.
 
 
 

File docs/source/stemming.rst

 The :class:`whoosh.analysis.CharsetFilter` applies a character map to token
 text. For example, it will filter the tokens ``u'café', u'resumé', ...`` to
 ``u'cafe', u'resume', ...``. This is the usually the method you'll want to use
-unless you need to use a charset to tokenize terms.
+unless you need to use a charset to tokenize terms::
 
     from whoosh.analysis import CharsetFilter, StemmingAnalyzer
     from whoosh import fields
 The :mod:`whoosh.support.charset` module contains an accent folding map useful
 for most Western languages, as well as a much more extensive Sphinx charset
 table and a function to convert Sphinx charset tables into the character maps
-required by ``CharsetTokenizer`` and ``CharsetFilter``::
-    
+required by ``CharsetTokenizer`` and ``CharsetFilter`` ::
+
     # To create a filter using an enourmous character map for most languages
     # generated from a Sphinx charset table
     from whoosh.analysis import CharsetFilter

File src/whoosh/__init__.py

 # limitations under the License.
 #===============================================================================
 
-__version__ = (1, 2, 7)
+__version__ = (1, 3, 0)
 
 
 def versionstring(build=True, extra=True):

File src/whoosh/searching.py

     def __len__(self):
         """Returns the total number of documents that matched the query. Note
         this may be more than the number of scored documents, given the value
-        of the ``limit`` keyword argument to :method:`Searcher.search`.
+        of the ``limit`` keyword argument to :meth:`Searcher.search`.
         """
         
         if self._docs is None:
             d.score = None
 
     def fields(self, n):
-        """Returns the stored fields for the document at the ``n``th position
-        in the results. Use :method:`Results.docnum` if you want the raw
+        """Returns the stored fields for the document at the ``n`` th position
+        in the results. Use :meth:`Results.docnum` if you want the raw
         document number instead of the stored fields.
         """