Classify Software for NLP Homework on Sentiment Analysis
In order to use the software in this package, you need to add ./bin to
you PATH variable. An easy way to do this is as follows. Go to the
directory where you unpacked the classify.tgz file (i.e. the directory
that contains this README) and do:
$ export CLASSIFY_DIR=`pwd`
$ export PATH=$PATH:$CLASSIFY_DIR/bin
This README describes the contents of this directory.
The files you will modify for this homework are:
The tennis and ppa datasets are in
classify/data/tennis: the tennis data set, in the format required by tennis_cat.py
classify/data/ppa: the prepositional phrase attachment data in the ppa format
classify/data/hcr: the health care reform twitter dataset in xml
There are other Python programs and modules that tennis_cat.py and
ppa_features.py depend on. YOU SHOULD NOT MODIFY THESE FILES.
classify/naivebayes.py: Jason Baldridge's implementation of naive Bayes for HW2.
classify/nlp_ppa_features.py: Jason Baldridge's features for the PPA task.
classify/classify_util.py: A module with useful utility functions that simplify tennis_cat.py and ppa_attach.py.
classify/twitter_util.py: A module with useful utility functions for working with tweets and scoring sentiment of tweets.
classify/score.py: Score the predictions of a classifier against the gold standard.
Packages from other sources that are included here:
classify/BitVector.py: An implementation of bit vectors that can be used in ppa_features.py. Python Software Foundation License
classify/porter_stemmer.py: An implementation of the Porter stemmer that can be used in ppa_features.py
classify/twokenize.py and classify/emoticons.py: Python scripts written by Brendan O'Connor to tokenize tweets. Apache Software License