Thomas Waldmann  committed 8576324

fix optimize() and *MpWriter to keep dynamic fields, fixes #244 (now really :)

I decided to change Schema.names a litte (in a backwards compatible way):
It now supports an optional check_names param to support checking for dynamic
field names.

Whoosh had at 2 places issues with this (both fixed by this changeset),
because the old .names() was called without considering that there might be
dynamic field names also.

We maybe should consider making check_names a mandatory parameter, that would
maybe avoid such mistakes, but it would be backwards incompatible.

I also refactored the code a little, to make both affected places a litte more

I tested with nosetests and the test supplied by the author of issue #244, all
succeeding now.

  • Participants
  • Parent commits f5f3d9c
  • Branches default

Comments (0)

Files changed (3)

File src/whoosh/

         return sorted(self._fields.items())
-    def names(self):
+    def names(self, check_names=None):
         """Returns a list of the names of the fields in this schema.
+        :param check_names: (optional) sequence of field names to check
+            whether the schema accepts them as (dynamic) field names -
+            acceptable names will also be in the result list.
+            Note: You may also have static field names in check_names, that
+            won't create duplicates in the result list. Unsupported names
+            will not be in the result list.
-        return sorted(self._fields.keys())
+        fieldnames = self._fields.keys()
+        if check_names is not None:
+            fieldnames += [fieldname
+                           for fieldname in set(check_names) - set(fieldnames)
+                           if fieldname in self]
+        return sorted(fieldnames)
     def clean(self):
         for field in self:

File src/whoosh/filedb/

         schema = self.schema
         newdoc = self.docnum
         perdocwriter = self.perdocwriter
-        sharedfields = set(schema.names()) & set(reader.schema.names())
         for docnum in reader.all_doc_ids():
             # Skip deleted documents
             # For each field in the document, copy its stored value,
             # length, and vectors (if any) to the writer
-            for fieldname in sharedfields:
+            for fieldname in schema.names(d.keys()):
                 field = schema[fieldname]
+                value = d.get(fieldname)
                 length = (reader.doc_field_length(docnum, fieldname, 0)
                           if field.scorable else 0)
-                perdocwriter.add_field(fieldname, field, d.get(fieldname),
-                                       length)
+                perdocwriter.add_field(fieldname, field, value, length)
                 if field.vector and reader.has_vector(docnum, fieldname):
                     v = reader.vector(docnum, fieldname)
                     perdocwriter.add_vector_matcher(fieldname, field, v)

File src/whoosh/filedb/

         schema = self.schema
         storage =
         codec = self.codec
-        fieldnames = list(schema.names())
         # Merge per-document information
         pdw = self.perdocwriter
                 # Add the base doc count to the sub-segment doc num
                 pdw.start_doc(basedoc + i)
                 # Call add_field to store the field values and lengths
-                for fieldname in fieldnames:
+                for fieldname in schema.names(fs.keys()):
                     value = fs.get(fieldname)
                     length = lenreader.doc_field_length(i, fieldname)
                     pdw.add_field(fieldname, schema[fieldname], value, length)