README for Moses Job Scripts
moses_job_scripts is a toolkit to help in basic tasks for training and developing SMT systems. It contains:
- Testbench to automate Moses-based SMT system training, tuning and evaluation (phrase based and factored)
- Batch training, tuning and evaluation of Moses models and extraction of results
Preparing your SMT environment
- The first task is to install the following software:
- Moses 0.91+
- IRSTLM [optional]
- METOER or meteor_indic (An adaptation of the METEOR tool for Indian languages) [optional]
- TER [optional]
moses_job_scriptsrequires the Moses and related softwares to be installed as per the directory layout mentioned in section 'Directory layout for SMT software installation'.
Each of these have their prerequisites. Please check the README for each of these tools. You can use the following as a guide for installing all these softwares:
NOTE: As you can see, installation of these is pretty complicated and a script to automate the installation of the entire system would be desirable. If you end up writing one, sharing that would be appreciated. Please mail me at email@example.com.
Edit the file and set the variables
SMT_SYSTEM_DIR: The path to the
SMT_METRICS_DIR: The path to the
Directory layout for SMT software installation
|--- giza-pp (compiled giza++ source code)
|--- bin (contains giza++ binaries - giza-pp/mkcls-v2/mkcls giza-pp/GIZA++-v2/GIZA++ giza-pp/GIZA++-v2/snt2cooc.out )
|--- moses_job_scripts (contains scripts to run the entire SMT workflow. It is the directory containing this README )
|--- mosesdecoder (moses decoder)
|--- srilm (srilm)
|--- irstlm (irstlm) [optional]
|--- meteor (Meteor/meteor_indic)
|--- ter (TER)
Using the testbench to train and evaluate a translation system
Once the SMT environment is ready, it is pretty easy to use the workbench for running an experiment to train and evaluate a translation system.
- Create the parallel corpus files to be used for the experiment. The files must be in a single directory and must be named as follows:
train.<src_lang> e.g. train.en
Create a configuration file which mentions the experimental settings. A sample configuration file can be found here:
Run the following command:
moses_run.sh <config_file> [notune|notrain]
notune: if provided, no tuning and evaluation on tuned model is done
notrain: if provided, no training and evaluatin on trained model is done is done. Only tuning is done, and it is assumed that the workspace contains all model files generated by an earlier training run.
The intermediate and final output are generated in the $WORKSPACE directory.
The workspace will contain the following directories:
log: contains various log files
cleaned: cleaned up corpus
lm: The target side language model
moses_data: The intermediate files and model output after training
tuning: The intermediate files and output generated after tuning
evaluation: evaluation results
run_params.conf: A copy of the config file for the experiment
The important files for observing the output are:
evaluation/test_no_tun.<tgt_lang>: output from untuned model
evaluation/test.<tgt_lang>: output from tuned model
evaluation/results_wo_tuning/summary.txt: evaluation results without tuning
evaluation/results_with_tuning/summary.txt: evaluation results with tuned model
moses_data/model/moses.ini: untuned model file
tuning/moses.ini: tuned model file
To be documented
Anoop Kunchukuttan ( firstname.lastname@example.org )
1.0 : Stable release
Copyright Anoop Kunchukuttan 2013 - present
Moses Job Scripts is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Moses Job Scripts is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with Moses Job Scripts. If not, see http://www.gnu.org/licenses/.