The readme is constantly being updated and made more informative... really... ¡neta! ;-)


Natural Language Processing, Open Information Extraction, Part-of-Speech tags, Relation Extraction, PhD thesis

What do we have here?

Essentially, it is a system for detection and extraction of arbitrary verb-based relations along with their arguments from texts in Spanish language.

In fact, what we have in the ./src folder there is a chaotic collection of files accompanying my PhD thesis...

In ./data folder there're couple files for testing:

  • news-v6.es-en.300.es - 300 sentences from Reuters news corpus
  • news-v6.es-en.300.es.pos - same 300 sentences POS-tagged according to EAGLES POS tag set. POS-tagging was done using Freeling-2.2 POS-tagger
  • output.extr - what to expect as output

How to run it?

If you happen to have your texts POS-tagged with EAGLES POS-tages and formatted one sentence per line, then, all you need is to:

python ./src/fact_extr_regexp4.py ./src/facts_extr.config ./data/your_input.pos > ./output/your_output.extr

However, if you do need to POS-tag your dataset first:

  • download Freeling
  • to get the idea how I did it, check ./extraction_pipeline.bat ... yes, it's .bat.

Who do I talk to?

  • talk to me