Commits

Author Commit Message Labels Comments Date
Matt Chaput
Set the default block size to 8K.
Matt Chaput
Used aliases to remove references to cPickle from code body. Wasn't checking pickle size using -1 protocol (in TableWriter._write_block).
Matt Chaput
Changed RLock to Lock, removed recursive @protected decorator. Renamed doclength_records to doclength_table.
Matt Chaput
Removed guard around file reads, and so removed the EndOfFile exception. Hopefully this won't bite me in the ass. Changed posting pool implementation to explicitly keep track of posting counts so it doesn't rely on EndOfFile.
Matt Chaput
Changed default blocksize to 16K.
Matt Chaput
Added "us" to the stop list (to go with "you").
Matt Chaput
Fixed major bug in TableReader._seek_postings where it was confusing parts of the posting info structure. TableReader.__getitem__ is now TableReader.get() so it can be patched based on TableReader.haspostings. Changed TermReader.weights() to use new Format.read_weight which may be more efficient when all you want is the weight.
Matt Chaput
TableWriter.add_row() now requires a value.
Matt Chaput
Docstring and argument name cleanup. Added results.extend, results.increase, and results.increase_and_extend methods.
Matt Chaput
Added guards against using searcher.doc_field_length() for a field that does not store field lengths. Update MultiFieldSorter to work with "missingfirst" and added a docstring.
Matt Chaput
Docstring cleanup.
Matt Chaput
Removed unused import.
Matt Chaput
Added experimental BoostTextFilter. Added StopFilter to StemmingAnalyzer.
Matt Chaput
Reimplemented TableWriter/TableReader as single classes that can have optional postings. Fixed bugs in DocReader/MultiDocReader related to new implementation of field length storage.
Matt Chaput
Added ability to specify where documents should be sorted (beginning or end) in a sorter when the document doesn't contain the sort field. Minor cleanups.
Matt Chaput
Changed implementation of field length storage. Changed term_count() to frequency(). Fixed scoring implementations. Fixed Searcher iteration.
Matt Chaput
Added convenience __setitem__ method that calls either set() or clear().
Matt Chaput
Simplified how And and Or work. Changed Phrase to work with either per-posting positions or a position vector.
Matt Chaput
Changed fields.TEXT to use per-posting positions instead of a vector.
Matt Chaput
Fixed bugs in multi-run merging.
Matt Chaput
Minor formatting, changed default block size.
Matt Chaput
Fixed directory deletion.
Matt Chaput
Fixed docstring. Changed Index.searcher() to pass keyword arguments to the Searcher constructor. Added Index.unlock() before cleaning old files when create = True.
Matt Chaput
Clarified docstring, removed obsolete attribute.
Matt Chaput
Docstring cleanup.
Matt Chaput
highlight.py: cleaned up some dumb decisions. qparser.py: clarified comment.
Matt Chaput
"Protected" methods need to be locked with an RLock, not a Lock.
Matt Chaput
Returned to a Lucene-like highlighting system in highlight.py. Removed code from passages.py.
Matt Chaput
Small changes for simplification and consistency.
Matt Chaput
Changed from_ back to iter_from.
  1. Prev
  2. Next