Commits

Robert Bu committed 9129934

add readme

  • Participants
  • Parent commits edc7794

Comments (0)

Files changed (2)

+This is a experimental python project that extracts IMDB reviews for a movie, classifies them and generate a result html file
+The project is based on the programming assignments in **Udacity CSC 101 class** and **Stanford NLP class**
+The script is also used as a plugin in my term project for my Distributed System class this semster
+
+Usage:
+To Train:
+	python imdb.py -t LIST_FILE MAX_COMMENT_COUNT
+
+To Classify:
+	python imdb.py -c OUTPUT_HTML_PATH MOVIE_TITLE [MAX_COMMENT_COUNT]
+
+The files in the lists directory is the movie lists I used to train the NaiveBayes classifier, they come from random titles in
+	* IMDB Top 250 (http://www.imdb.com/chart/top)
+	* IMDB Bottom 100 (http://www.imdb.com/chart/bottom)
+	* New York Times The Best 1,000 Movies Ever Made (http://www.nytimes.com/ref/movies/1000best.html)
+
+Trained data is stored in trained.raw as a plain text file
+
+* Future Works:
+	Try other algorithms to improve the accuracy
+	Make a online version