Documentation for yc-pyutils

A collection of handy utility scripts for NLP with Python (mainly data processing related).

For more information, refer to the documentation http://skylander.bitbucket.org/yc-pyutils


Version 1.0 (development)

  • Retructured ycutils/ folder. NLP related modules go into nlp/ folder and tsvio went into io/ folder.
  • Fixed documentation to reflect the update.


  • Added tokenizer module which uses a new paradigm for tokenization.

Version 0.2 (development)


  • Reworked tokenization module. Many bugs found.
  • Added scripts tokenize-docs.py and build-vocab.py.
  • Added filter_rare_terms method to BOW class.

Version 0.1

  • Initial version. Very much work in progress.


yc-pyutils is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

yc-pyutils is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with yc-pyutils. If not, see http://www.gnu.org/licenses/.

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.