HTTPS SSH

patent-parsing-tools

System requirements:

sudo yum install python-devel libxslt-devel libxml2-devel

Python requirements:

pip install -r requirements.txt

Running:

Collecting and serializing data:

python -m patent_parsing_tools.supervisor [working_directory] [train_destination] [test_destination] [year_from] [year_to]

Eg.

python -m patent_parsing_tools.supervisor patents/working_directory patents/train_destination patents/test_destination 2014 2015

Generating dictionary with train set:

python -m patent_parsing_tools.bow.dictionary_maker [train_directory] [max_parsed_patents] [dict_max_size] [dictionary_name]

Eg.

python -m patent_parsing_tools.bow.dictionary_maker patents/train_destination 1000000000 4096 dictionary.txt

Generate bag of words with train set and test set:

python -m patent_parsing_tools.bow.bag_of_words [directory_with_serialized_patents] [destination_directory] [dictionary.txt] [package_size > 1024]

Eg.

python -m patent_parsing_tools.bow.bag_of_words patents/train_destination patents/final_dataset_train dictionary.txt 1048576
python -m patent_parsing_tools.bow.bag_of_words patents/test_destination patents/final_dataset_test dictionary.txt 1048576

Running tests

python -m unittest discover .

Contributing

python setup.py register
python setup.py bdist