Commits

Anonymous committed 0f51765

updated README

Comments (0)

Files changed (2)

 
 This a Python wrapper for Stanford University's NLP group's Java-based [CoreNLP tools](http://nlp.stanford.edu/software/corenlp.shtml).  It can either be imported as a module or run as an JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM and usually a few minutes loading time), most applications will probably want to run it as a server.
 
-It requires [pexpect](http://www.noah.org/wiki/pexpect) and uses [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/), which are included. 
+It requires [pexpect](http://www.noah.org/wiki/pexpect).  Included dependencies are [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/).
 
-There's not much to this script.  I decided to create it after having trouble initializing the JVM through JPypes on two different machines. 
+There's not much to this script.  I decided to create it after having trouble initializing a JVM using JPypes on two different machines. 
 
 It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON.  The parser will break if the output changes significantly. I have only tested this on **Core NLP tools version 1.0.2** released 2010-11-12.
 
 
 You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's CoreNLP package.  Then copy all of the python files from this repository into the `stanford-corenlp-2010-11-12` folder.
 
+In other words: 
+
+    sudo pip install pexpect
+    wget http://nlp.stanford.edu/software/stanford-corenlp-v1.0.2.tgz
+    tar xvfz stanford-corenlp-v1.0.2.tgz
+    cd stanford-corenlp-2010-11-12
+    git clone git://github.com/dasmith/stanford-corenlp-python.git
+    mv stanford-corenlp-python/* .
+
 Then, to launch a server:
 
     python server.py
 
-Optionally, specify a host or port:
+Optionally, you can specify a host or port:
 
     python server.py -H 0.0.0.0 -p 3456
 
-To run a public JSON-RPC server on port 3456.
+That will run a public JSON-RPC server on port 3456.
+
+Assuming you are running on port 8080, the code in `client.py` shows an example parse: 
+
+    port jsonrpc
+    server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),
+            jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080)))
+
+    result = server.parse("hello world")
+    print "Result", result
+
+
+Produces:
+
+    Result [{"text": "hello world", "tuples": [["amod", "world", "hello"]], "words": {"world": {"NamedEntityTag": "O", "CharacterOffsetEnd": "11", "Lemma": "world", "PartOfSpeech": "NN", "CharacterOffsetBegin": "6"}, "hello": {"NamedEntityTag": "O", "CharacterOffsetEnd": "5", "Lemma": "hello", "PartOfSpeech": "JJ", "CharacterOffsetBegin": "0"}}}]
 
-See `client.py` for example of how to connect with a client.
 
 <!--
 ## Adding WordNet
+#!/usr/bin/env python
 """
 This is a Python interface to Stanford Core NLP tools.
 It can be imported as a module or run as a server.
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.