searchr is a program that searches a collection of document against the keywords you provide to it.
It has two main functions: indexer(...) and retriever(...)
The indexer(...) function takes 3 arguments:
1. test collection directory (where you store the document collection);
2. a stop list name (it comes with the program, it's called "english.stop");
3. output directory for index files (any directory name you like);
The retriever takes at least 3 arguments of which
the first is the directory containing the index files,
the second is the number of documents to be returned,
the rest are the keywords
* How to run the program
from command line, go to the directory where the searchr.py file is, then enter:
$ python -i searchr.py
You'd then be in the python shell with all the functions ready for you.
** How to invoke the indexer function
After you've entered the python interpreter, enter
>>> indexer(<test_collection_dir>, <stoplist>, <output_dir>)
<test_collection_dir> is the path to the test collection directory,
<stoplist> is the name of the stop list, which is provided for you, it's called english.stop
<output_dir> is the path to the directory where you want to store the index files.
These 3 function arguments must be strings; and if it's a directory, it must have a trailing slash.
For example, if the test collection directory is test_collection/sci.spsace/,
the stop list is english.stop and the desired output directory is postings/;
then invoke the indexer with
>>> indexer('test_collection/sci.spsace/', 'english.stop', 'postings/')
NOTE: a directory must have a trailing slash, as "postings/", not "postings"
** How to invoke the retriever
The retriever is invoke the same way as the indexer. As an example, if the postings(index files) are
in the 'postings/' directory, you want the top 3 relevant documents to be returned, and you
have 3 keywords: lore, digit and pigment, then in the python shell enter:
>>> retriever('postings/', 3, 'lore', 'digit', 'pigment')
Examples that come with the script
An example comes with the python script, to run it, uncomment line 197 and 270 within the python script,
and then run the script again.