Overview

HTTPS SSH

README

Code of a feedforward neural network model described in our EMNLP 2014 paper

  • Word Translation Prediction for Morphologically Rich Languages with Bilingual Neural Networks. Ke Tran, Arianna Bisazza and Christof Monz

Coming soon: Log-bilinear and Convolutional neural network models.

Any question regarding to the software, please contact to Ke Tran <m.k.tran AT uva DOT nl>

Requirements

You only need Torch7 to run this program

Data Format

source and target sentences are tokenized and lowercased target file has following format for each sentence

word1|tag1|lemma1 word2|tag2|lemma2 ...

where tags are obtained from supervised or unsupervised morphological tagger

Example of target file with snowball segmentation:

чтобы|ы|чтоб восстановить|ить|восстанов поддержку|у|поддержк латинской|ой|латинск америки|и|америк –|NULL|– и|NULL|и понизить|ить|пониз популярность|ость|популярн чавеса|а|чавес –|NULL|– администрации|ии|администрац буша|а|буш нужно|о|нужн гораздо|о|горазд больше|е|больш ,|NULL|, чем|NULL|чем короткий|ий|коротк визит|ит|виз .|NULL|.

Run Experiment

Prepare data

$ th prepare.lua -tf ../toydata/train/train.ru.sb -sd ../toydata/train.en.dictF5 -prune ../toydata/lexicon.en-ru.pruned -save morph.lex.sb

Training

Use -mode tag to train suffix model and -mode lemma to train stem model.

Example of training suffix model:

$ th train.lua -src ../toydata/train/train.en -trg ../toydata/train/train.ru.sb -a ../toydata/train/train.align -morph morph.lex.sb -niter 10 -m models/model.tag.t7 -mode tag -train

Testing

$ th test.lua -src ../toydata/test/test.en -trg ../toydata/test/test.ru.sb -a ../toydata/test/test.align -morph morph.lex.sb -m models/model.tag.t7.it2 -mode tag -topn 1