Commits

Author Commit Message Labels Comments Date
Matt Chaput
Checking in experimental quality-based optimizations.
Branches
pre1
Matt Chaput
Checking in beginnings of new matcher architecture.
Branches
pre1
Matt Chaput
Removed rogue assert.
Branches
pre1
Matt Chaput
Changed byte-order < back to !.
Branches
pre1
Matt Chaput
Long list of changes to indexing and searching. Changed field length file to a list of approximated byte values that are read into memory. Pools must now accept any keyword argument in their __init__ method. Pools spool lengths to disk files and then collect them. Pools now use marshal instead of custom encode/decode functions. StructFile.read/write_array() now use array methods and byteswap if necessary. Changed scoring and Searcher interfaces…
Branches
pre1
Matt Chaput
Updates tests to current API.
Branches
pre1
Matt Chaput
Added missing add_field_length() method to pools. Added missing functionality to ramdb.
Branches
pre1
Matt Chaput
Changed types of on-disk structs to little ("<") from network order ("!").
Branches
pre1
Matt Chaput
Merged with 48f7e10c0077
Branches
pre1
Matt Chaput
Added LengthWriter/LengthReader for storing field lengths by field.
Branches
pre1
Matt Chaput
Now store field lengths in a custom file instead of StructHash. OrderedHashReader now does binary searches on disk instead of in memory. Bug fixes in query module. Added Reader.max_field_length() method. Removed field_length method from Index interface. Renamed Fake* classes to List* in postings. Added util.now() function.
Branches
pre1
Matt Chaput
Added options to initializers in analysis module for maxsize parameter of StopFilter.
Branches
pre1
Matt Chaput
Added maxsize filter to StopFilter.
Branches
pre1
Matt Chaput
Added default argument to doc_field_length().
Branches
pre1
Matt Chaput
Fixed pool field lengths. Changed pools to store temp files in a directory.
Branches
pre1
Matt Chaput
Added code to MultiPool to delete temp files. Added Schema.clean() method. Sorted and expanded default stop list. Renamed HtmlFormatter.clear() to clean().
Branches
pre1
Matt Chaput
Fixed pools.MultiPool implementation to not deadlock, hopefully.
Branches
pre1
Matt Chaput
Removed read lock stuff, somehow broke multipool :(.
Branches
pre1
Matt Chaput
Changed JoinableQueue to plain Queue.
Branches
pre1
Matt Chaput
Fixed file naming in fileindex, file deleting in pools. Minor change in filewriting.
Branches
pre1
Matt Chaput
Cleaned up naming. Changed FileIndex to acquire read locks instead of opening a Searcher to lock files.
Branches
pre1
Matt Chaput
Minor test fixups.
Branches
pre1
Matt Chaput
Put back missing minscore logic in TopDocs.
Branches
pre1
Matt Chaput
Fixed typo in filereading. Fixed FixedHashReader.__contains__. Fixed query parser when analyzer returns no tokens.
Branches
pre1
Matt Chaput
Cleaned up naming conventions. Fixed bugs in SegmentWriter.
Branches
pre1
Matt Chaput
Renamed FileIndexWriter to SegmentWriter.
Branches
pre1
Matt Chaput
Refactored file pools and writers. Moved key and value encoding/decoding functions into new module filedb.misc. Added FileIndex.segment_count() method. Added sketch of Enron email corpus benchmark script, to be fixed up later.
Branches
pre1
Matt Chaput
Added FixedHashWriter and FixedHashReader.
Branches
pre1
Matt Chaput
Merge with205f947313cf4a67c361e6ff85707893a951237c
Branches
pre1
Matt Chaput
Removed multi-merge code, added total number of postings to write_postings args.
Branches
pre1
  1. Prev
  2. Next