Commits

Show all
Author Commit Message Labels Comments Date
david_walker
add only single space after sentence final punctuation
david_walker
don't add space before closing quote strip spaces at end of line don't add newlines just before eof
david_walker
merge
david_walker
Append an s to currency names unless the next word is "loan"
david_walker
don't split at apostrophes because the simple approach of having a list of words which can contain them ("you'll" etc.) fails to account for names, some of which can contain multiple apostrophes (e.g. "Ng'ang'a"). disable SpellDigitsRule because there are too many exceptions where it shouldn't be applied that it causes more work than it saves.
da...@david-office.Bubka
add ll to list of apostrophe endings
david_walker
prevent pycountry logging complaint by adding null handler
david_walker
put log file in temp directory instead of current dir
da...@david-office
add two spaces after sentence-final period
da...@david-office.Bubka
add a.m. and p.m. as abbreviations
david_walker
one acre fund template cleanup
david_walker
new regexes for One Acre Fund
david_walker
merge the unicode logging fix to transforms.py from the dev branch
david_walker
change debug log file from full path to logfile.txt to just 'kea.log'
david_walker
fix unicode error in debug logging with helper function token_strings()
Branches
parse
david_walker
rename kea2.py to kea.py. from now on, there will only be a single kea.py. the production version lives in the default branch, and development versions have their own branches, whose changes will be merged into the default branch once stable.
david_walker
delete obsolete kea.py
david_walker
preparation for branching add .emacs.desktop to .hgignore minor changes to kea2.py and rules.py additions to samples.txt
david_walker
prepare for sentence delimiter token prevent split of decimal numbers in AlphaNumSplitRule
david_walker
make ':' a non-spacing punctuation character
david_walker
add checks for token.is_URL to split rules
david_walker
don't split contractions at apostrophe
david_walker
convert X USD to $X
david_walker
force abbreviations to normal form
david_walker
more indexsplittransform fixes
david_walker
fix bugs in punct and alphanum splits
david_walker
fix IndexSplitTransform bug
david_walker
improve debug output
david_walker
add splitting of alphanumeric tokens
david_walker
proper spacing on output text generation
  1. Prev
  2. Next