Commits

Author Commit Message Labels Comments Date
david_walker
add only single space after sentence final punctuation
david_walker
don't add space before closing quote strip spaces at end of line don't add newlines just before eof
david_walker
merge
david_walker
Append an s to currency names unless the next word is "loan"
david_walker
don't split at apostrophes because the simple approach of having a list of words which can contain them ("you'll" etc.) fails to account for names, some of which can contain multiple apostrophes (e.g. "Ng'ang'a"). disable SpellDigitsRule because there are too many exceptions where it shouldn't be applied that it causes more work than it saves.
da...@david-office.Bubka
add ll to list of apostrophe endings
david_walker
prevent pycountry logging complaint by adding null handler
david_walker
put log file in temp directory instead of current dir
da...@david-office
add two spaces after sentence-final period
da...@david-office.Bubka
add a.m. and p.m. as abbreviations
david_walker
one acre fund template cleanup
david_walker
new regexes for One Acre Fund
david_walker
initial progress report
Branches
parse
david_walker
completed draft of first progress report
Branches
parse
david_walker
BibTeX bibliography file
Branches
parse
david_walker
update token.cend for merging tokens in "years old" type expressions look for any direct ancestor with hspec or hspechc instead of only grandparent for nn years old rule
Branches
parse
david_walker
checkpoint
Branches
parse
david_walker
handle quoted parenthesis add has_parent method to node class
Branches
parse
david_walker
implementation progress report, initial (incomplete) version
Branches
parse
david_walker
rename parser and token modules expand handling of nn-years-old type expressions
Branches
parse
david_walker
converting pos from simple string to PosContainer object
Branches
parse
david_walker
Token.pos was a single Penn Treebank token type, such as 'NN'. With this checkin, it becomes a list of PosTag namedtuple objects, each of which has a token type and a probability value. In most cases there will be only a single entry in the list, but there can be three or more. This change is necessary because the parser fails to parse some sentences given only the highest-probability part-of-speech tag for each token, but succeeds if lower-probability altern…
Branches
parse
david_walker
rename parser.py to myparser.py to avoid conflict with system module rename test_yearold.py to yearold.py
Branches
parse
david_walker
improve handling of punctuation characters
Branches
parse
david_walker
add test for "nn-year-old" type expressions
Branches
parse
david_walker
changes needed to support YearOldRule, which depends on parse trees
Branches
parse
david_walker
launch cheap as xml-rpc server if not already running process cheap output to produce tree structure with embedded token objects
Branches
parse
david_walker
code migrated into rules.py
Branches
parse
david_walker
passing unit tests, except improve/expand. that can be re-enabled once it is possible to search for a noun phrase
Branches
parse
david_walker
work in progress: removing transforms and making rules directly change tokens
Branches
parse
  1. Prev
  2. Next