Commits

Author Commit Message Labels Comments Date
Jacob Perkins
tagset argument for train_chunker, remove babelfish references
Jacob Perkins
include all nltk_trainer packages
Jacob Perkins
cleanup
Jacob Perkins
merge
Jacob Perkins
Merge pull request #26 from lababidi/master Pickled Classifier Naming
Mahmoud Lababidi
Cleaner method using os.path.split instead of using regex to strip corpora filename, used os.path.split to keep final folder name of corpora
Mahmoud Lababidi
fixed file output bug, if corpus is ../some_dir/corpus, pickled classifier is in wrong folder
Mahmoud Lababidi
fixed file output bug, if corpus is ../some_dir/corpus, pickled classifier is in wrong folder
Jacob Perkins
links to text-processing.com & NLTK 3 cookbook
Jacob Perkins
megam link
Jacob Perkins
py3 classification updates
Jacob Perkins
merge
Jacob Perkins
node label for ieer
Jacob Perkins
merge
Jacob Perkins
Merge pull request #19 from kecaps/master roundup tests for train_classifier
Space
added a test case for multi-category classification
Space
added tests for word_count and using max_feats
Space
add another line for most informative test and test for passing parameters to gradient boosting classifier
Space
add import of sys for call to sys.exit
Space
add tracing output to test
Jacob Perkins
Merge pull request #17 from kecaps/master A few bug fixes and enhancements for evaluating classifiers on large datasets
Space
separated out cross-fold execution path and added raising an exception if trying to cross-fold a multi-binary classifier
Space
Changed so regardless of whether args.multi and args.binary are set, the same order of operations happens: 1. corpus read into test_instances, train_instances 2. if feature selection (args.score_fn and args.max_feat), words per category are extracted from the train_instances and score_fn is applied to select features and define featx 3. features are extracted from instances by applying featx This order ensures that the corpus is only read once and that the feature selection is only done over the training set (the classifier can be biased if feature selection is done over the entire corpus). Corpus is read differently based if args.multi and args.binary. Depending on how the corpus is read test_instances and train_instances are different data structures. category_words is defined base…
Space
make norm_words a default argument to extract_text. handle score_fn not being set
Space
add words to FreqDist, not tuple
Space
GradientBoostingClassifier uses 'learning_rate' not 'learn_rate'
Space
re-factored train_classifier to pull out functions for re-use
Space
added missing tracing
Space
split data into training and testing datasets before calculating word scores
Space
ignore compiled python files
  1. Prev
  2. Next