KeyError on "NamedEntityTag" and "Lemma" when parsing xml output

Issue #15 new
Anonymous created an issue

Hey there! Thanks for building this, made life much easier for a Reddit analysis side project I'm working on :)

I really just wanted to use the sentiment annotator, but I got a KeyError during the batch_parse on NamedEntityTag and Lemma, even though I hadn't specified those annotators. I fixed it by commenting those out, diff below. I can reproduce the error if this isn't very helpful.

index ecc6129..78afab1 100755 --- a/corenlp/corenlp.py +++ b/corenlp/corenlp.py @@ -315,19 +315,20 @@ def parse_parser_xml_results(xml, file_name="", raw_output=False): token = raw_sent_list[id]['tokens']['token'] sent['words'] = [ [unicode(token['word']), OrderedDict([ - ('NamedEntityTag', str(token['NER'])), + #('NamedEntityTag', str(token['NER'])), ('CharacterOffsetEnd', str(token['CharacterOffsetEnd'])), ('CharacterOffsetBegin', str(token['CharacterOffsetBegin'])), ('PartOfSpeech', str(token['POS'])), - ('Lemma', unicode(token['lemma']))])] + #('Lemma', unicode(token['lemma'])) + ])] ] else: sent['words'] = [[unicode(token['word']), OrderedDict([ - ('NamedEntityTag', str(token['NER'])), + #('NamedEntityTag', str(token['NER'])), ('CharacterOffsetEnd', str(token['CharacterOffsetEnd'])), ('CharacterOffsetBegin', str(token['CharacterOffsetBegin'])), - ('PartOfSpeech', str(token['POS'])), - ('Lemma', unicode(token['lemma']))])] + ('PartOfSpeech', str(token['POS'])),])] + #('Lemma', unicode(token['lemma'])) for token in raw_sent_list[id]['tokens']['token']]

     sent['dependencies'] = [[enforceList(dep['dep'])[i]['@type'],

diff --git a/corenlp/default.properties b/corenlp/default.properties index 01e3cba..6fbabe6 100644 --- a/corenlp/default.properties +++ b/corenlp/default.properties @@ -1,4 +1,4 @@ -annotators = tokenize, ssplit, pos, lemma, ner, parse, dcoref +annotators = tokenize, ssplit, parse, sentiment # annotators = tokenize, ssplit, pos, lemma, ner, parse, dcoref, sentiment

# A true-casing annotator is also available (see below)

Comments (0)

  1. Log in to comment