1. Grzegorz Chrupała
  2. funtag

Overview

HTTPS SSH

funtag

Author: Grzegorz Chrupała <g.chrupala@uvt.nl>
Date: 2013-03-05
Version: 0.2

This package contains the function tagger described in [RANLP_2007].

Installation

The package contains precompiled binaries for 64 bit Linux. If they work for you, you do not need to compile. Just make sure you have the ruby interpreter installed.

If you need to recompile, make sure you have C++ compiler as well as the the Haskell Platform. First compile the LIBSVM executables:

cd lib/libsvm-3.16
make
cp svm-train svm-predict ../../bin/
cd ../..

Then compile the funtag executable:

cabal update
cabal install --prefix=`pwd`

Usage

Export the environment variable FUNTAG_HOME to point to the funtag directory. The wrapper script for running the function labeler for English is funtag/bin/english If you run it with no arguments it will print usage message:

./bin/english
USAGE:
english train-reparsed   PARSED-TRAIN-FILE GOLD-TRAIN-FILE
          to train a model on parser output
english train-gold     GOLD-TRAIN-FILE
          to train a model only using gold training trees
english predict MODEL-PREFIX TEST-FILE
          to process TEST-FILE using model files MODEL-PREFIX.*
english eval GOLD-TEST-FILE LABELED-FILE

You can train a model using the method described in the [RANLP_2007] paper (train-reparsed) or train just using the treebank trees (train-gold). For the fist method you need the gold trees file (GOLD-TRAIN-FILE) and in addition the same trees parsed (PARSED-TRAIN-FILE) by the same parsing model that you will be using to parse raw text (doing cross-training here might be even better)

The model will be stored in a number of files named *.model and *.map. Once you have those you can use the predict command to label trees. There is a model trained using the reparsing method, on WSJ sections 2-21 parsed with the vanilla version of Charniak's parser (no reranking), with no cross-training in funtag/data/wsj02_21.parsed001.*, as well as the gold trees from the same sections. So if you don't want to retrain, and just want to label some trees, you can run the program as follows:

./bin/english predict data/wsj02_21.parsed001 trees > trees.labeled

For this to work you need to have the trees in the same format as output by Charniak (e.g. root node called S1 etc)

[RANLP_2007](1, 2) Grzegorz Chrupała, Nicolas Stroppa, Josef van Genabith and Georgiana Dinu. 2007. Better Training for Function Labeling. RANLP. http://grzegorz.chrupala.me/papers/chrupala-et-al-2007/paper.pdf