Wiki

Clone wiki

Questimate / Feature Extraction

The following types for features can be directly extracted using the command line:

There is also a special category of features, called Label features. These are the classes or labels to be predicted, based on the other features.

For this purpose, the tool can load several kinds of resources such as n-best lists, n-gram language models, lattices, IBM1 scores etc., and it also uses some external tools like POS taggers and language model creators. Some of these tools are in Java,so they can be directly called from the API, others are used via shell scripts.

All the features currently supported are global or sentence level 'dense' (as opposed to 'sparse') features.

For many (in fact, most) cases, each feature has the following variants:

  • Value on the source side
  • Value on the target side
  • Normalized values for the source and the target side
  • Ratio features (source to target)

The Java API is flexible enough to make it easy to add code for extracting many other kinds of features.

Updated