yanchuan sim avatar yanchuan sim committed 655e8bf

filter stopwords by default

Comments (0)

Files changed (1)

ycutils/tokenize.py

   return filter(lambda w: w not in my_stopwords, tokens)
 #end def
 
-def ngram_tokens(tokens, n, sep_char='_', filter_stopwords=set()):
+def ngram_tokens(tokens, n, sep_char='_', filter_stopwords=STOPWORDS):
   """
   Generate list of n-grams from a sequence of tokens. N-gram tokens are not formed across stopwords define in :attr:`filter_stopwords`.
 
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.