Smarter merge policies
It's possible to have just one segment with less than fib(i+5) documents in it. It can still be quite large, for example my 20th segment, in order of increasing size, had 21224 documents in it, while fib(25) is 121393.
MERGE_SMALL would always rewrite this segment, and only this segment, which is a complete waste of time.
I decided to implement a merge policy which combines all segments that are less than a certain size, unless there's only one of them. This seems to perform quite well:
def CUSTOM_MERGE_SMALL(writer, segments): """This policy merges small segments, where "small" is defined using a fixed number of documents. Unlike whoosh.filedb.filewriting.MERGE_SMALL, this one does nothing unless there's more than one segment to merge. """
from whoosh.filedb.filereading import SegmentReader unchanged_segments =  segments_to_merge =  for segment in segments: if segment.doc_count_all() < 10000: segments_to_merge.append(segment) else: unchanged_segments.append(segment) if len(segments_to_merge) > 1: for segment in segments_to_merge: with SegmentReader(writer.storage, writer.schema, segment) as reader: writer.add_reader(reader) else: # don't bother merging a single segment unchanged_segments.extend(segments_to_merge) return unchanged_segments