HTTPS SSH

Chunker for Russian texts

Welcome to the homepage of Chunker for Russian texts, project about text chunking. Scripts create data for training from Syntagrus. Attempt to build chunker trained on such data is produced. Chunker is a program that divides a Russian text in syntactically correlated parts of words (chunks). The system is detecting chunks (NP, VP, ADVP, ADJP, PP, NUMP, CONJP, PRT, INTJ).

What is Chunking?

Chunking - the identification of syntactically correlated parts of words in a sentence - is a significant task in NLP research. It is an intermediate step towards full parsing. It was the shared task for CoNLL-2000.