Construct the letter n-grams representation of documents.
The letter n-grams widget constructs the representation of documents using letter n-grams. Letter n-grams are sequences of n consecutive letters that appear in the text. Same as in the bag of words widget, text features (in this case letter n-grams) are added as metaatributes to documents. The value corresponding to a metaatribute is the frequency of that metaatribute (letter n-gram) in the particular document. In the Ngram size box it is possible to choose the number of consecutive letters that are taken as features. It is possible to choose letter n-grams of two, three, or four letters. The number of different letter n-grams in the entire collection is shown on the bottom of the widget.
Below is a simple example how to use this widget. The input is fed directly from the Text file widget, and the output is sent to the Feature selection widget.