Commits

Hiroyoshi Komatsu  committed 33888c3

minor bugfix and add usage of raw_output option

  • Participants
  • Parent commits ab2b8a5

Comments (0)

Files changed (3)

 # A Python wrapper for the Java Stanford Core NLP tools
 ---------------------------
 
-This is a fork of [stanford-corenlp-python](https://github.com/dasmith/stanford-corenlp-python)
+This is a fork of Dustin Smith's [stanford-corenlp-python](https://github.com/dasmith/stanford-corenlp-python). A Python interface to [Stanford CoreNLP](http://nlp.stanford.edu/software/corenlp.shtml). It can either be python package, or run as a JSON-RPC server.
 
 ## Edited
    * Update to Stanford CoreNLP v3.2.0
     parsed = batch_parse(raw_text_directory, corenlp_dir)  # It returns a generator object
     print parsed  #=> [{'coref': ..., 'sentences': ..., 'file_name': 'new_sample.txt'}]
 
+The function uses XML output feature of Stanford CoreNLP, and you can take all information by `raw_output` option. If true, CoreNLP's XML is returned as a dictionary without converting the format.
+
+    parsed = batch_parse(raw_text_directory, corenlp_dir, raw_output=True)
+
+(note: The function requires xmltodict now, you must install it by `sudo pip install xmltodict`)
+
 ## Developer
    * Hiroyoshi Komatsu [hiroyoshi.komat@gmail.com]
    * Johannes Castner [jac2130@columbia.edu]

File corenlp/corenlp.py

     call(command, shell=True)
 
     #reading in the raw xml file:
-    result = []
+    # result = []
     try:
         for output_file in os.listdir(xml_dir):
             with open(xml_dir+'/'+output_file, 'r') as xml:
                 # parsed = xml.read()
                 file_name = re.sub('.xml$', '', os.path.basename(output_file))
-                result.append(parse_parser_xml_results(xml.read(), file_name,
-                                                       raw_output=raw_output))
+                # result.append(parse_parser_xml_results(xml.read(), file_name,
+                #                                        raw_output=raw_output))
+                yield parse_parser_xml_results(xml.read(), file_name,
+                                               raw_output=raw_output)
     finally:
         file_list.close()
         shutil.rmtree(xml_dir)
-    return result
+    # return result
 
 class StanfordCoreNLP:
     """
 AUTHOR = "Hiroyoshi Komatsu"
 AUTHOR_EMAIL = "hiroyoshi.komat@gmail.com"
 URL = "https://bitbucket.org/torotoki/corenlp-python"
-VERSION = "3.2.0-0"
+VERSION = "3.2.0-1"
 
 # Utility function to read the README file.
 # Used for the long_description.  It's nice, because now 1) we have a top level