Commits

Show all
Author Commit Message Labels Comments Date
Matt Chaput
Bumped version and changed index format to account for changes due to index compression work.
Matt Chaput
Term index and vector files now use 16-bit codes instead of full field names. Changed architecture of term index and term vector reader/writers.
Matt Chaput
Result[] now returns a Hit object. Switched tf back to float in case it messed with quality optimizations. Added features to enron script.
Matt Chaput
Changes to reduce index size, see issue #47. Miscellaneous fixes and improvements. Fixes to posting compression. More space-efficient coding of term info. Write stored fields as a list instead of a dictionary. Fixed speed of StructFile.write_array() on little-endian machines. Added Reuters 21578 benchmark.
Matt Chaput
Added compression of filedb posting data. Fixed exception in update_document when the unique term did not exist. Fixed "docs/sec" printout in enron.py.
Matt Chaput
Added caching to FileWriter.update_document(). Increased version number.
Matt Chaput
Adding Eclipse .settings dir to .hgeclipse.
Matt Chaput
Added unit test for Facets object.
Matt Chaput
Fix for counts() and categorize() after changed array to a dict.
Matt Chaput
Changed interface of Facets object to take the Searcher at instantiation.
Matt Chaput
Check that the matcher is still active before calling _get_spans() in SpanWrappingMatcher. Fixes issue #44. Bumped version to 1.0.
Matt Chaput
Added spans() method to WrappingMatcher base class. Fixes issue #43. Bumped version number.
Matt Chaput
Fixed typo in Weighting compatibility class. Fixes issue #42.
Matt Chaput
Bumped version number.
Matt Chaput
Merging branches.
Matt Chaput
Added clustering functions to classify module for future use.
Matt Chaput
Made Searcher a context manager (to close itself). See issue #41.
Matt Chaput
Fixed missing spans() method from MultiMatcher.
Matt Chaput
Fixed term range parsing. Bumped version number.
Matt Chaput
Bumped version number.
Matt Chaput
Instead of simply sorting the collected heap in reverse, sort by reversed score and then by forward document number. This enforces a consistent ordering of documents with the same score. Fixes issue #39
Matt Chaput
Implemented date range parsing.
Matt Chaput
Fixed up dateparse for simple dates (no ranges yet). Changed SimpleWeighting back to Weighting.
Matt Chaput
Shouldn't have used random.sample() since it only picks items from the list once.
Matt Chaput
Less obtuse implementation of unique_name() function.
Matt Chaput
Additonal work on date query parsing.
Matt Chaput
Checking in initial infrastructure to support parsing date queries.
Matt Chaput
Changed scoring architecture, need to update docs.
Matt Chaput
Bumped version number.
Matt Chaput
Fixed up the enron indexing script a bit.
  1. Prev
  2. Next