Commits

Anonymous committed 8576324

fix optimize() and *MpWriter to keep dynamic fields, fixes #244 (now really :)

I decided to change Schema.names a litte (in a backwards compatible way):
It now supports an optional check_names param to support checking for dynamic
field names.

Whoosh had at 2 places issues with this (both fixed by this changeset),
because the old .names() was called without considering that there might be
dynamic field names also.

We maybe should consider making check_names a mandatory parameter, that would
maybe avoid such mistakes, but it would be backwards incompatible.

I also refactored the code a little, to make both affected places a litte more
similar.

I tested with nosetests and the test supplied by the author of issue #244, all
succeeding now.

Comments (0)

Files changed (3)

src/whoosh/fields.py

 
         return sorted(self._fields.items())
 
-    def names(self):
+    def names(self, check_names=None):
         """Returns a list of the names of the fields in this schema.
+
+        :param check_names: (optional) sequence of field names to check
+            whether the schema accepts them as (dynamic) field names -
+            acceptable names will also be in the result list.
+            Note: You may also have static field names in check_names, that
+            won't create duplicates in the result list. Unsupported names
+            will not be in the result list.
         """
-        return sorted(self._fields.keys())
+        fieldnames = self._fields.keys()
+        if check_names is not None:
+            fieldnames += [fieldname
+                           for fieldname in set(check_names) - set(fieldnames)
+                           if fieldname in self]
+        return sorted(fieldnames)
 
     def clean(self):
         for field in self:

src/whoosh/filedb/filewriting.py

         schema = self.schema
         newdoc = self.docnum
         perdocwriter = self.perdocwriter
-        sharedfields = set(schema.names()) & set(reader.schema.names())
 
         for docnum in reader.all_doc_ids():
             # Skip deleted documents
             perdocwriter.start_doc(newdoc)
             # For each field in the document, copy its stored value,
             # length, and vectors (if any) to the writer
-            for fieldname in sharedfields:
+            for fieldname in schema.names(d.keys()):
                 field = schema[fieldname]
+                value = d.get(fieldname)
                 length = (reader.doc_field_length(docnum, fieldname, 0)
                           if field.scorable else 0)
-                perdocwriter.add_field(fieldname, field, d.get(fieldname),
-                                       length)
+                perdocwriter.add_field(fieldname, field, value, length)
                 if field.vector and reader.has_vector(docnum, fieldname):
                     v = reader.vector(docnum, fieldname)
                     perdocwriter.add_vector_matcher(fieldname, field, v)

src/whoosh/filedb/multiproc.py

         schema = self.schema
         storage = self.storage
         codec = self.codec
-        fieldnames = list(schema.names())
 
         # Merge per-document information
         pdw = self.perdocwriter
                 # Add the base doc count to the sub-segment doc num
                 pdw.start_doc(basedoc + i)
                 # Call add_field to store the field values and lengths
-                for fieldname in fieldnames:
+                for fieldname in schema.names(fs.keys()):
                     value = fs.get(fieldname)
                     length = lenreader.doc_field_length(i, fieldname)
                     pdw.add_field(fieldname, schema[fieldname], value, length)
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.