Commits

Matt Chaput  committed 46482e9

Put back fix for backwards compatiblity with pickled Segment objects.
Improved docs.

  • Participants
  • Parent commits ed21316

Comments (0)

Files changed (3)

File docs/source/facets.rst

     w.add_document(title=u"Best Bet")
     w.add_document(title=u"First Action")
     w.commit()
-    
+
 If you sort this index by "title", you might expect the results to be
 "Best Bet" followed by "First Action", but in fact it will be the reverse! This
 is because Whoosh is sorting by **terms**, not the original text you indexed.
     for title in titles:
         w.add_document(title=title, sort_title=title)
     w.commit()
+    
+    # ...
+    
+    results = my_searcher.search(my_query, sortedby="sort_title")
 
+Using a separate field for sorting allows you to "massage" the sort values,
+since they don't need to be displayed to the user. For example, you can
+convert the sort value to lowercase (to prevent uppercase letters from sorting
+before lowercase letters) and remove spaces to prevent them from affecting the
+sort order::
+
+   for title in titles:
+      sort_title = title.lower().replace(" ", "")
+      w.add_document(title=title, sort_title=sort_title)
+
+Alternatively, you can store the field contents and use a
+:class:`whoosh.sorting.StoredFieldFacet` to sort by the stored value. This
+means you don't need to use a separate field, but it is usually slower than
+sorting by an indexed field, and doesn't give you the chance to massage the
+sort values::
+
+   schema = fiels.Schema(title=fields.TEXT(stored=True))
+   
+   # ...
+   
+   for title in titles:
+      w.add_document(title=title)
+  
+   # ...
+   
+   sff = sorting.StoredFieldFacet("title")
+   results = my_searcher.search(my_query, sortedby=sff)
+   
 
 The sortedby keyword argument
 -----------------------------
 Sort by the value of the size field::
 
     results = searcher.search(myquery, sortedby="size")
-    
+
 Sort by the reverse (highest-to-lowest) order of the "price" field::
 
     facet = sorting.FieldFacet("price", reverse=True)
     mf.add_field("size")
     mf.add_field("price", reverse=True)
     results = searcher.search(myquery, sortedby=mf)
-    
+
     # or...
     sizes = sorting.FieldFacet("size")
     prices = sorting.FieldFacet("price", reverse=True)
     results = searcher.search(myquery, sortedby=[sizes, prices])
-    
+
 Sort by the "category" field, then by the document's score::
 
     cats = sorting.FieldFacet("category")
 Manufacturer         Price
 -------------------- -----------------
 Apple (5)            $0 - $100 (2)
-Sanyo (1)            $101 - $500 (10)          
+Sanyo (1)            $101 - $500 (10)
 Sony (2)             $501 - $1000 (1)
 Toshiba (5)
 ==================== =================
 Group by the value of the "category" field::
 
     results = searcher.search(myquery, groupedby="category")
-    
+
 Group by the value of the "category" field and also by the value of the "tags"
 field and a date range::
 
     cats = sorting.FieldFacet("category")
     tags = sorting.FieldFacet("tags", allow_overlap=True)
     results = searcher.search(myquery, groupedby={"category": cats, "tags": tags})
-    
+
     # ...or, using a Facets object has a little less duplication
     facets = sorting.Facets()
     facets.add_field("category")
     results = mysearcher.search(myquery, groupedby=myfacet)
     results.groups()
     # {"small": [8, 5, 1, 2, 4], "medium": [3, 0, 6], "large": [7, 9]}
-    
+
     # Don't sort the groups to match the order of documents in the results
     # (faster)
     myfacet = FieldFacet("size", maptype=sorting.UnorderedList)
     results = mysearcher.search(myquery, groupedby=myfacet)
     results.groups()
     # {"small": 5, "medium": 3, "large": 2}
-    
+
     # Only remember the "best" document in each group
     myfacet = FieldFacet("size", maptype=sorting.Best)
     results = mysearcher.search(myquery, groupedby=myfacet)
     # Sort search results by the value of the "path" field
     facet = sorting.FieldFacet("path")
     results = searcher.search(myquery, sortedby=facet)
-    
+
     # Group search results by the value of the "parent" field
     facet = sorting.FieldFacet("parent")
     results = searcher.search(myquery, groupedby=facet)
     qdict["E-H"] = query.TermRange("name", "e", "h")
     qdict["I-L"] = query.TermRange("name", "i", "l")
     # ...
-    
+
     qfacet = sorting.QueryFacet(qdict)
     r = searcher.search(myquery, groupedby={"firstltr": qfacet})
-    
+
 By default, ``QueryFacet`` only supports **non-overlapping** grouping, where a
 document cannot belong to multiple facets at the same time (each document will
 be sorted into one category arbitrarily.) To get overlapping groups with
 buckets $100 "wide"::
 
     pricefacet = sorting.RangeFacet("price", 0, 1000, 100)
-    
+
 The first argument is the name of the field. The next two arguments are the
 full range to be divided. Value outside this range (in this example, values
 below 0 and above 1000) will be sorted into the "missing" (None) group. The
 value in the list being the size for all subsequent divisions. For example::
 
     pricefacet = sorting.RangeFacet("price", 0, 1000, [5, 10, 35, 50])
-    
+
 ...will set up divisions of 0-5, 5-15, 15-50, 50-100, and then use 50 as the
 size for all subsequent divisions (i.e. 100-150, 150-200, and so on).
 
 example, this::
 
     facet = sorting.RangeFacet("num", 0, 10, 4, hardend=False)
-    
+
 ...gives divisions 0-4, 4-8, and 8-12, while this::
 
     facet = sorting.RangeFacet("num", 0, 10, 4, hardend=True)
-    
+
 ...gives divisions 0-4, 4-8, and 8-10. (The default is ``hardend=False``.)
 
 .. note::
     schema = fields.Schema(id=fields.STORED,
                            text=fields.TEXT(stored=True, vector=True))
     ix = RamStorage().create_index(schema)
-    
+
 ...you could use a function to sort documents higher the closer they are to
-having equal occurances of two terms:: 
-    
+having equal occurances of two terms::
+
     def fn(searcher, docnum):
         v = dict(searcher.vector_as("frequency", docnum, "text"))
         # Sort documents that have equal number of "alfa" and "bravo" first

File src/whoosh/filedb/fileindex.py

                 sleep(0.05)
 
 
-# from whoosh.codec.whoosh2 import W2Segment as Segment  # @UnusedImport
+# Fix for old indexes pickled with whoosh.fileindex.Segment objects
+from whoosh.codec.whoosh2 import W2Segment as Segment  # @UnusedImport
 

File src/whoosh/searching.py

         self.top_n = items
 
     def upgrade_and_extend(self, results):
-        """Combines the effects of extend() and increase(): hits that are also
+        """Combines the effects of extend() and upgrade(): hits that are also
         in 'results' are raised. Then any hits from the other results object
         that are not in this results object are appended to the end.