Anonymous avatar Anonymous committed 1a64f9f

updated README

Comments (0)

Files changed (2)

 
 This a Python wrapper for Stanford University's NLP group's Java-based [CoreNLP tools](http://nlp.stanford.edu/software/corenlp.shtml).  It can either be imported as a module or run as an JSON-RPC server. Because it uses many large trained models (requiring 3GB RAM and usually a few minutes loading time), most applications will probably want to run it as a server.
 
-There's not much to this script.
+It requires [pexpect](http://www.noah.org/wiki/pexpect) and uses [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/), which are included. 
 
-It requires `pexpect`.
-
-This uses [jsonrpc](http://www.simple-is-better.org/rpc/) and [python-progressbar](http://code.google.com/p/python-progressbar/), which are included in this repository.
+There's not much to this script.  I decided to create it after having trouble initializing the JVM through JPypes on two different machines. 
 
+It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON.  The parser will break if the output changes significantly. I have only tested this on **Core NLP tools version 1.0.2** released 2010-11-12.
 
 ## Download and Usage 
 
-You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's Core-NLP package.  Then copy all of the python files from this repository into the `stanford-corenlp-2010-11-12` folder.
+You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's CoreNLP package.  Then copy all of the python files from this repository into the `stanford-corenlp-2010-11-12` folder.
 
 Then, to launch a server:
 
 
 ## Questions 
 
-I have only tested this on **Core NLP tools version 1.0.2** released 2010-11-12.
-
 If you think there may be a problem with this wrapper, first ensure you can run the Java program:
 
     java -cp stanford-corenlp-2010-11-12.jar:stanford-corenlp-models-2010-11-06.jar:xom-1.2.6.jar:xom.jar:jgraph.jar:jgrapht.jar -Xmx3g edu.stanford.nlp.pipeline.StanfordCoreNLP -props default.properties
 
         classname = "edu.stanford.nlp.pipeline.StanfordCoreNLP"
         javapath = "java"
+        # include the properties file, so you can change defaults
+        # but any changes in output format will break parse_parser_results() 
+        props = "-props default.properties" 
 
         for jar in jars:
             if not os.path.exists(jar):
                 sys.exit(1)
         
         # spawn the server
-        self._server = pexpect.spawn("%s -Xmx3g -cp %s %s" % (javapath, ':'.join(jars), classname))
+        self._server = pexpect.spawn("%s -Xmx3g -cp %s %s %s" % (javapath, ':'.join(jars), classname, props))
         
         print "Starting the Stanford Core NLP parser."
         # show progress bar while loading the models
         pbar.update(5)
         self._server.expect("Entering interactive shell.")
         pbar.finish()
-        print self._server.before
+        print "Server loaded."
+        #print self._server.before
 
     def parse(self, text):
         """ 
         """
         print "Request", text
         print self._server.sendline(text)
-        max_expected_time = 2 + len(text) / 200.0
+        # How much time should we give the parser to parse it?it
+        #
+        max_expected_time = min(5, 2 + len(text) / 200.0)
         print "Timeout", max_expected_time
         end_time = time.time() + max_expected_time 
         incoming = ""
             freshlen = len(ch)
             time.sleep (0.0001)
             incoming = incoming + ch
-            if "\nNLP>" in incoming or end_time - time.time() < 0:
+            if "\nNLP>" in incoming:
                 break
+            if end_time - time.time() < 0:
+                return dumps({'error': "timed out after %f seconds" %
+                    max_expected_time, 'output': incoming})
         results = parse_parser_results(incoming)
         print "Results", results
         # convert to JSON and return
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.