Preprocess the text, according to the language in which it is written.
Preprocess widget is used to lemmatize, convert to lower case, and remove stop words from texts in a number of languages. Any of these options can be turned on or off in the Options box. The language box is used to choose the language in which the text is written. Lemmatization and the list of stop words depend on the choice of the language. Info box displays the number of documents in the collection and the name of the text attribute.
Below is a simple example how to use this widget. The input is fed directly from the Text File widget, and the output to the Bag of Words widget.