Commits

dustin smith  committed 2ef4e40

README update

  • Participants
  • Parent commits 4452bba

Comments (0)

Files changed (1)

 
 This a Python wrapper for Stanford University's NLP group's Java-based [CoreNLP tools](http://nlp.stanford.edu/software/corenlp.shtml).  It can either be imported as a module or run as an JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM and usually a few minutes loading time), most applications will probably want to run it as a server.
 
-It requires [pexpect](http://www.noah.org/wiki/pexpect).  Included dependencies are [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/).
+It requires [pexpect](http://www.noah.org/wiki/pexpect).  The repository includes and uses code from [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/).
 
-There's not much to this script.  I decided to create it after facing difficulties using the alternative ways to get Python to talk to Stanford's dependency parser.  First, I had trouble initializing a JVM using JPypes on two different machines with [stanford-parser-python](http://projects.csail.mit.edu/spatial/Stanford_Parser), and Jython's lack of support for the Python modules I needed prevented a [Jython solution](http://blog.gnucom.cc/2010/using-the-stanford-parser-with-jython/). 
+There's not much to this script.  I decided to create it after having problems using other Python wrappers to Stanford's dependency parser. 
+First the JPypes approach used in [stanford-parser-python](http://projects.csail.mit.edu/spatial/Stanford_Parser) had trouble initializing a JVM on two separate computers.  Next, I discovered I could not use a 
+[Jython solution](http://blog.gnucom.cc/2010/using-the-stanford-parser-with-jython/) because the Python modules I needed did not work in Jython.
 
 It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON.  The parser will break if the output changes significantly. I have only tested this on **Core NLP tools version 1.0.2** released 2010-11-12.
 
     result = loads(server.parse("hello world"))
     print "Result", result
 
-Produces a list with a parsed dictionary for each sentence:
+That returns a list containing a dictionary for each sentence, with keys `text`, `tuples` of the dependencies, and `words`:
 
     Result [{'text': 'hello world', 
              'tuples': [['amod', 'world', 'hello']], 
              'words': [['hello', {'NamedEntityTag': 'O', 'CharacterOffsetEnd': '5', 'CharacterOffsetBegin': '0', 'PartOfSpeech': 'JJ', 'Lemma': 'hello'}], 
                        ['world', {'NamedEntityTag': 'O', 'CharacterOffsetEnd': '11', 'CharacterOffsetBegin': '6', 'PartOfSpeech': 'NN', 'Lemma': 'world'}]]}]
     
-To use it in a regular script or to edit/debug (since errors via RPC are opaque), load the module instead:
+To use it in a regular script or to edit/debug it (because errors via RPC are opaque), load the module instead:
 
     from corenlp import *
-    corenlp = StanfordCoreNLP() 
+    corenlp = StanfordCoreNLP()  # wait a few minutes...
     corenlp.parse("Parse an imperative sentence, damnit!")
 
 I added a function called `parse_imperative` that introduces a dummy pronoun to overcome the problems that dependency parsers have with imperative statements.
 
     java -cp stanford-corenlp-2010-11-12.jar:stanford-corenlp-models-2010-11-06.jar:xom-1.2.6.jar:xom.jar:jgraph.jar:jgrapht.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -props default.properties
 
+Then, send me (Dustin Smith) a message on GitHub or through email (contact information is available [on my webpage](http://web.media.mit.edu/~dustin).
 
 #  TODO