The readme is constantly being updated and made more informative... really... ¡neta! ;-)
Natural Language Processing, Open Information Extraction, Part-of-Speech tags, Relation Extraction, PhD thesis
What do we have here?
Essentially, it is a system for detection and extraction of arbitrary verb-based relations along with their arguments from texts in Spanish language.
In fact, what we have in the ./src folder there is a chaotic collection of files accompanying my PhD thesis...
In ./data folder there're couple files for testing:
- news-v6.es-en.300.es - 300 sentences from Reuters news corpus
- news-v6.es-en.300.es.pos - same 300 sentences POS-tagged according to EAGLES POS tag set. POS-tagging was done using Freeling-2.2 POS-tagger
- output.extr - what to expect as output
How to run it?
If you happen to have your texts POS-tagged with EAGLES POS-tages and formatted one sentence per line, then, all you need is to:
python ./src/fact_extr_regexp4.py ./src/facts_extr.config ./data/your_input.pos > ./output/your_output.extr
However, if you do need to POS-tag your dataset first:
- download Freeling
- to get the idea how I did it, check ./extraction_pipeline.bat ... yes, it's .bat.
Who do I talk to?
- talk to me