Commits

Dustin Smith  committed d486ad2

import works with simplejson or json. Removed parse_imperative() because the new Standford Parser seems to handle imperatives well.

  • Participants
  • Parent commits 4f4edbd

Comments (0)

Files changed (5)

+                   GNU LESSER GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+
+  This version of the GNU Lesser General Public License incorporates
+the terms and conditions of version 3 of the GNU General Public
+License, supplemented by the additional permissions listed below.
+
+  0. Additional Definitions.
+
+  As used herein, "this License" refers to version 3 of the GNU Lesser
+General Public License, and the "GNU GPL" refers to version 3 of the GNU
+General Public License.
+
+  "The Library" refers to a covered work governed by this License,
+other than an Application or a Combined Work as defined below.
+
+  An "Application" is any work that makes use of an interface provided
+by the Library, but which is not otherwise based on the Library.
+Defining a subclass of a class defined by the Library is deemed a mode
+of using an interface provided by the Library.
+
+  A "Combined Work" is a work produced by combining or linking an
+Application with the Library.  The particular version of the Library
+with which the Combined Work was made is also called the "Linked
+Version".
+
+  The "Minimal Corresponding Source" for a Combined Work means the
+Corresponding Source for the Combined Work, excluding any source code
+for portions of the Combined Work that, considered in isolation, are
+based on the Application, and not on the Linked Version.
+
+  The "Corresponding Application Code" for a Combined Work means the
+object code and/or source code for the Application, including any data
+and utility programs needed for reproducing the Combined Work from the
+Application, but excluding the System Libraries of the Combined Work.
+
+  1. Exception to Section 3 of the GNU GPL.
+
+  You may convey a covered work under sections 3 and 4 of this License
+without being bound by section 3 of the GNU GPL.
+
+  2. Conveying Modified Versions.
+
+  If you modify a copy of the Library, and, in your modifications, a
+facility refers to a function or data to be supplied by an Application
+that uses the facility (other than as an argument passed when the
+facility is invoked), then you may convey a copy of the modified
+version:
+
+   a) under this License, provided that you make a good faith effort to
+   ensure that, in the event an Application does not supply the
+   function or data, the facility still operates, and performs
+   whatever part of its purpose remains meaningful, or
+
+   b) under the GNU GPL, with none of the additional permissions of
+   this License applicable to that copy.
+
+  3. Object Code Incorporating Material from Library Header Files.
+
+  The object code form of an Application may incorporate material from
+a header file that is part of the Library.  You may convey such object
+code under terms of your choice, provided that, if the incorporated
+material is not limited to numerical parameters, data structure
+layouts and accessors, or small macros, inline functions and templates
+(ten or fewer lines in length), you do both of the following:
+
+   a) Give prominent notice with each copy of the object code that the
+   Library is used in it and that the Library and its use are
+   covered by this License.
+
+   b) Accompany the object code with a copy of the GNU GPL and this license
+   document.
+
+  4. Combined Works.
+
+  You may convey a Combined Work under terms of your choice that,
+taken together, effectively do not restrict modification of the
+portions of the Library contained in the Combined Work and reverse
+engineering for debugging such modifications, if you also do each of
+the following:
+
+   a) Give prominent notice with each copy of the Combined Work that
+   the Library is used in it and that the Library and its use are
+   covered by this License.
+
+   b) Accompany the Combined Work with a copy of the GNU GPL and this license
+   document.
+
+   c) For a Combined Work that displays copyright notices during
+   execution, include the copyright notice for the Library among
+   these notices, as well as a reference directing the user to the
+   copies of the GNU GPL and this license document.
+
+   d) Do one of the following:
+
+       0) Convey the Minimal Corresponding Source under the terms of this
+       License, and the Corresponding Application Code in a form
+       suitable for, and under terms that permit, the user to
+       recombine or relink the Application with a modified version of
+       the Linked Version to produce a modified Combined Work, in the
+       manner specified by section 6 of the GNU GPL for conveying
+       Corresponding Source.
+
+       1) Use a suitable shared library mechanism for linking with the
+       Library.  A suitable mechanism is one that (a) uses at run time
+       a copy of the Library already present on the user's computer
+       system, and (b) will operate properly with a modified version
+       of the Library that is interface-compatible with the Linked
+       Version.
+
+   e) Provide Installation Information, but only if you would otherwise
+   be required to provide such information under section 6 of the
+   GNU GPL, and only to the extent that such information is
+   necessary to install and execute a modified version of the
+   Combined Work produced by recombining or relinking the
+   Application with a modified version of the Linked Version. (If
+   you use option 4d0, the Installation Information must accompany
+   the Minimal Corresponding Source and Corresponding Application
+   Code. If you use option 4d1, you must provide the Installation
+   Information in the manner specified by section 6 of the GNU GPL
+   for conveying Corresponding Source.)
+
+  5. Combined Libraries.
+
+  You may place library facilities that are a work based on the
+Library side by side in a single library together with other library
+facilities that are not Applications and are not covered by this
+License, and convey such a combined library under terms of your
+choice, if you do both of the following:
+
+   a) Accompany the combined library with a copy of the same work based
+   on the Library, uncombined with any other library facilities,
+   conveyed under the terms of this License.
+
+   b) Give prominent notice with the combined library that part of it
+   is a work based on the Library, and explaining where to find the
+   accompanying uncombined form of the same work.
+
+  6. Revised Versions of the GNU Lesser General Public License.
+
+  The Free Software Foundation may publish revised and/or new versions
+of the GNU Lesser General Public License from time to time. Such new
+versions will be similar in spirit to the present version, but may
+differ in detail to address new problems or concerns.
+
+  Each version is given a distinguishing version number. If the
+Library as you received it specifies that a certain numbered version
+of the GNU Lesser General Public License "or any later version"
+applies to it, you have the option of following the terms and
+conditions either of that published version or of any later version
+published by the Free Software Foundation. If the Library as you
+received it does not specify a version number of the GNU Lesser
+General Public License, you may choose any version of the GNU Lesser
+General Public License ever published by the Free Software Foundation.
+
+  If the Library as you received it specifies that a proxy can decide
+whether future versions of the GNU Lesser General Public License shall
+apply, that proxy's public statement of acceptance of any version is
+permanent authorization for you to choose that version for the
+Library.
 First the JPypes approach used in [stanford-parser-python](http://projects.csail.mit.edu/spatial/Stanford_Parser) had trouble initializing a JVM on two separate computers.  Next, I discovered I could not use a 
 [Jython solution](http://blog.gnucom.cc/2010/using-the-stanford-parser-with-jython/) because the Python modules I needed did not work in Jython.
 
-It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON.  The parser will break if the output changes significantly. I have only tested this on **Core NLP tools version 1.2.0** released 2011-09-16.
+It runs the Stanford CoreNLP jar in a separate process, communicates with the java process using its command-line interface, and makes assumptions about the output of the parser in order to parse it into a Python dict object and transfer it using JSON.  The parser will break if the output changes significantly, but it has been tested on **Core NLP tools version 1.3.1** released 2012-04-09.
 
 ## Download and Usage 
 
-You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's CoreNLP package.  Then copy all of the python files from this repository into the `stanford-corenlp-2011-09-16` folder.
+You should have [downloaded](http://nlp.stanford.edu/software/corenlp.shtml#Download) and unpacked the tgz file containing Stanford's CoreNLP package.  By default, `corenlp.py` looks for the Stanford Core NLP folder as a subdirectory of where the script is being run.
 
 In other words: 
 
-    sudo pip install pexpect
-    wget http://nlp.stanford.edu/software/stanford-corenlp-v1.2.0.tgz
-    tar xvfz stanford-corenlp-v1.2.0.tgz
-    cd stanford-corenlp-2011-09-16
-    git clone git://github.com/dasmith/stanford-corenlp-python.git
-    mv stanford-corenlp-python/* .
+    sudo pip install pexpect unidecode   # unidecode is optional
+	git clone git://github.com/dasmith/stanford-corenlp-python.git
+	cd stanford-corenlp-python.git
+    wget http://nlp.stanford.edu/software/stanford-corenlp-2012-04-09.tgz
+    tar xvfz stanford-corenlp-2012-04-09.tgz
 
 Then, to launch a server:
 
 
 That returns a list containing a dictionary for each sentence, with keys `text`, `tuples` of the dependencies, and `words`:
 
-    Result [{'text': 'hello world', 
-             'tuples': [['amod', 'world', 'hello']], 
-             'words': [['hello', {'NamedEntityTag': 'O', 'CharacterOffsetEnd': 5, 'CharacterOffsetBegin': 0, 'PartOfSpeech': 'JJ', 'Lemma': 'hello'}], 
-                       ['world', {'NamedEntityTag': 'O', 'CharacterOffsetEnd': 11, 'CharacterOffsetBegin': 6, 'PartOfSpeech': 'NN', 'Lemma': 'world'}]]}]
+		{u'sentences': [{u'parsetree': u'(ROOT (NP (JJ hello) (NN world)))', 
+						 u'text': u'hello world', 
+						 u'tuples': [[u'amod', u'world', u'hello'], 
+						             [u'root', u'ROOT', u'world']], 
+						 u'words': [[u'hello', {u'NamedEntityTag': u'O', 
+						                        u'CharacterOffsetEnd': u'5', 
+						                        u'CharacterOffsetBegin': u'0', 
+						                        u'PartOfSpeech': u'UH', 
+						                        u'Lemma': u'hello'}], 
+						            [u'world', {u'NamedEntityTag': u'O', 
+						                        u'CharacterOffsetEnd': u'11', 
+						                        u'CharacterOffsetBegin': u'6', 
+						                        u'PartOfSpeech': u'NN', 
+						                        u'Lemma': u'world'}]]}]}
     
 To use it in a regular script or to edit/debug it (because errors via RPC are opaque), load the module instead:
 
 
 You can reach me, Dustin Smith, by sending a message on GitHub or through email (contact information is available [on my webpage](http://web.media.mit.edu/~dustin)).
 
-#  TODO
- 
-  - Mutex on parser
-  - Write test functions for parsing accuracy
-  - Calibrate parse-time prediction as function of sentence inputs
+# Contributors
+
 
 import jsonrpc
-from simplejson import loads
+try:
+    import json
+except ImportError:
+    import simplejson as json
+
 server = jsonrpc.ServerProxy(jsonrpc.JsonRpc20(),
         jsonrpc.TransportTcpIp(addr=("127.0.0.1", 8080)))
 
 # call a remote-procedure 
-result = loads(server.parse("hello world"))
+result = json.loads(server.parse("hello world"))
 print "Result", result
 
+result = json.loads(server.parse("stop smoking"))
+print "Result", result
 
+result = json.loads(server.parse("eat dinner"))
+print "Result", result
 #!/usr/bin/env python
-"""
-This is a Python interface to Stanford Core NLP tools.
-It can be imported as a module or run as a server.
+#
+# corenlp  - Python interface to Stanford Core NLP tools
+# Copyright (c) 2012 Dustin Smith
+#   https://github.com/dasmith/stanford-corenlp-python
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
 
-For more details:
-    https://github.com/dasmith/stanford-corenlp-python
-
-By Dustin Smith, 2011
-"""
-from simplejson import loads, dumps
+try:
+    import json
+except ImportError:
+    import simplejson as json
+    
 import optparse
 import sys
 import os
 import time
 import re
-from unidecode import unidecode
+import logging
 
-import pexpect
+try:
+    from unidecode import unidecode
+except ImportError:
+    logging.info("unidecode library not installed")
+    def unidecode(text):
+        return text
 
-import jsonrpc
 from progressbar import *
+import jsonrpc
+
+import pexpect
 
 
 def remove_id(word):
         Checks the location of the jar files.
         Spawns the server as a process.
         """
-
         jars = ["stanford-corenlp-2012-04-09.jar", 
                 "stanford-corenlp-2012-04-09-models.jar",
                 "joda-time.jar",
        
         # if CoreNLP libraries are in a different directory,
         # change the corenlp_path variable to point to them
-        corenlp_path = ""
+        corenlp_path = "stanford-corenlp-2012-04-09/"
+        
         java_path = "java"
         classname = "edu.stanford.nlp.pipeline.StanfordCoreNLP"
         # include the properties file, so you can change defaults
         if verbose: print "Request", text
         results = self._parse(text, verbose)
         if verbose: print "Results", results
-        return dumps(results)
-
-    def parse_imperative(self, text, verbose=True):
-        """
-        This is a hacky way to deal with imperative statements.
-
-        Takes an imperative, adds a personal pronoun, parses it,
-        and then removes it in the resulting parse.
-        
-        e.g. "open the door" gets parsed as "you open the door"
-        """
-        # find a pronoun that's not in the string already.
-        used_pronoun = None
-        pronouns = ["you","he", "she","i"]
-        for p in pronouns:
-            if text.startswith(p+" "):
-                # it's already an imperative!
-                used_pronoun = None
-                break
-            if p not in text:
-                # found one not in there already
-                used_pronoun = p
-                break
-        # if you can't find one, regress to original parse
-        if not used_pronoun:
-            return self.parse(text, verbose)
-  
-        # create text with pronoun and parse it
-        new_text = used_pronoun+" "+text.lstrip()
-        result = self._parse(new_text, verbose)
-        
-        if len(result) != 1:
-            print "Non-imperative sentence?  Multiple sentences found."
-
-        # remove the dummy pronoun
-        used_pronoun_offset = len(used_pronoun)+1
-        if result[0].has_key('text'):
-            result[0]['text'] = text
-            result[0]['tuples'] = filter(lambda x: not (x[1] == used_pronoun or x[2]
-                    == used_pronoun), result[0]['tuples'])
-            result[0]['words'] = result[0]['words'][1:]
-            # account for offset
-            ct = 0
-            for word, av in result[0]['words']:
-                for a,v in av.items():
-                    if a.startswith("CharacterOffset"):
-                        result[0]['words'][ct][1][a] = v-used_pronoun_offset
-                ct += 1
-            return dumps(result)
-        else:
-            # if there's a timeout error, just return it.
-            return dumps(result)
+        return json.dumps(results)
 
 
 if __name__ == '__main__':
+    """
+    This block is executed when the file is run directly as a script, not when it
+    is imported. 
+    
+    The code below starts an JSONRPC server
+    """
     parser = optparse.OptionParser(usage="%prog [OPTIONS]")
     parser.add_option(
         '-p', '--port', default='8080',
 
 import sys
 
+try:
+    import json
+except ImportError:
+    import simplejson as json
+
+
 #=========================================
 # errors
 
 #=========================================
 # data structure / serializer
 
-try:
-    import simplejson
-except ImportError, err:
-    print "FATAL: json-module 'simplejson' is missing (%s)" % (err)
-    sys.exit(1)
-
 #----------------------
 #
 def dictkeyclean(d):
     :SeeAlso:   JSON-RPC 1.0 specification
     :TODO:      catch simplejson.dumps not-serializable-exceptions
     """
-    def __init__(self, dumps=simplejson.dumps, loads=simplejson.loads):
+    def __init__(self, dumps=json.dumps, loads=json.loads):
         """init: set serializer to use
 
         :Parameters:
     :SeeAlso:   JSON-RPC 2.0 specification
     :TODO:      catch simplejson.dumps not-serializable-exceptions
     """
-    def __init__(self, dumps=simplejson.dumps, loads=simplejson.loads):
+    def __init__(self, dumps=json.dumps, loads=json.loads):
         """init: set serializer to use
 
         :Parameters: