Source

Kiva Editor's Assistant / tagger.py

Author Commit Message Labels Comments Date
david_walker
Token.pos was a single Penn Treebank token type, such as 'NN'. With this checkin, it becomes a list of PosTag namedtuple objects, each of which has a token type and a probability value. In most cases there will be only a single entry in the list, but there can be three or more. This change is necessary because the parser fails to parse some sentences given only the highest-probability part-of-speech tag for each token, but succeeds if lower-probability alternatives are present.
Branches
parse
david_walker
changes needed to support YearOldRule, which depends on parse trees
Branches
parse
david_walker
starting support for tracking charcter begin and end of tokens in original text
Branches
parse