Classify Software for NLP Homework on Sentiment Analysis http://nlp-s11.utcompling.com/assignments/sentiment-analysis In order to use the software in this package, you need to add ./bin to you PATH variable. An easy way to do this is as follows. Go to the directory where you unpacked the classify.tgz file (i.e. the directory that contains this README) and do: $ export CLASSIFY_DIR=`pwd` $ export PATH=$PATH:$CLASSIFY_DIR/bin This README describes the contents of this directory. The files you will modify for this homework are: The tennis and ppa datasets are in classify/data/tennis: the tennis data set, in the format required by tennis_cat.py classify/data/ppa: the prepositional phrase attachment data in the ppa format classify/data/hcr: the health care reform twitter dataset in xml There are other Python programs and modules that tennis_cat.py and ppa_features.py depend on. YOU SHOULD NOT MODIFY THESE FILES. classify/naivebayes.py: Jason Baldridge's implementation of naive Bayes for HW2. classify/nlp_ppa_features.py: Jason Baldridge's features for the PPA task. classify/classify_util.py: A module with useful utility functions that simplify tennis_cat.py and ppa_attach.py. classify/twitter_util.py: A module with useful utility functions for working with tweets and scoring sentiment of tweets. classify/score.py: Score the predictions of a classifier against the gold standard. Packages from other sources that are included here: classify/BitVector.py: An implementation of bit vectors that can be used in ppa_features.py. Python Software Foundation License classify/porter_stemmer.py: An implementation of the Porter stemmer that can be used in ppa_features.py classify/twokenize.py and classify/emoticons.py: Python scripts written by Brendan O'Connor to tokenize tweets. Apache Software License
531c15d - Added emoticon examples.
92c68cd - Added option for specifying auxiliary training file.
2a01731 - Created extraction and split for debate 08 data.
9a206d3 - Initial commit of source files.