Overview

Linguistic Processor
===================

The main file is LingusticProcessor.py. It handles all the questions of the
first part of the exercise. The other files correspond to each question. The
main.py is to execute in order all the question which have been implemented so
far.

* The check_tokenization file is for tests.
* In order to run main.py we have to put the texts under the folder wikipedia
in the same path as the source.
* The output of each step is in a corresponding file recognized by the
extension such as .tokenized, .analyzed etc
* CountLemma is used in two cases. It can count the lemmata in a file and
build its inverted index, or create the inverted index in a collection of
files. The final format of those two are slightly (or not so slightly
different)


 
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.