Commits

Author Commit Message Labels Comments Date
david_walker
revamp of rules.py continues
Branches
parse
david_walker
debug tokensearch.py
Branches
parse
david_walker
checkpoint before massive change--about to merge transform code directly into rules, and have rules apply themselves directly rather than in a two-phase rule/transform process.
Branches
parse
david_walker
tokensearch.py: finish get_levenshtein_dist and modify_tokens
Branches
parse
david_walker
in-development code to search and replace tokens
Branches
parse
david_walker
make repr more selective about what it displays
Branches
parse
david_walker
fix right_token.cbegin in ParagraphTransform
Branches
parse
david_walker
first incomplete code to interface to PET HPSG parser binary 'cheap'
Branches
parse
david_walker
finish adding code to update cbegin and cend attributes of tokens
Branches
parse
david_walker
modify get_transforms methods of rules to skip non_printing tokens;
Branches
parse
david_walker
make eof token have string *EOF*
Branches
parse
david_walker
split Token class out of base into mytoken.py (token.py conflicts with standrd module)
Branches
parse
david_walker
starting support for tracking charcter begin and end of tokens in original text
Branches
parse
david_walker
merge the unicode logging fix to transforms.py from the dev branch
david_walker
change debug log file from full path to logfile.txt to just 'kea.log'
david_walker
fix unicode error in debug logging with helper function token_strings()
Branches
parse
david_walker
rename kea2.py to kea.py. from now on, there will only be a single kea.py. the production version lives in the default branch, and development versions have their own branches, whose changes will be merged into the default branch once stable.
david_walker
delete obsolete kea.py
david_walker
preparation for branching
david_walker
prepare for sentence delimiter token
david_walker
make ':' a non-spacing punctuation character
david_walker
add checks for token.is_URL to split rules
david_walker
don't split contractions at apostrophe
david_walker
convert X USD to $X
david_walker
force abbreviations to normal form
david_walker
more indexsplittransform fixes
david_walker
fix bugs in punct and alphanum splits
david_walker
fix IndexSplitTransform bug
david_walker
improve debug output
david_walker
add splitting of alphanumeric tokens
  1. Prev
  2. Next