Wiki

Clone wiki

Questimate / Word Translation or Association Features

These features are derived from parallel data, i.e., they are based on pairs of source and target sentences and they measure the degree of co-occurrence or association between two words, and based on that, give a score for the sentence pair.

The following such feature are currently implemented:

  • IBM1Score: The co-occurrence based scores for a pair of sentences. This is just the multiplication of IBM1 scores of all possible source-target word pairs. Since this is an assymmetric scores, source to target as well as target to source version are calculated.
  • AvgNumTrans02: Based on word level IBM1 scores, the average number of possible translations for the words in the source sentence which have IBM1 scores (translation probabilities) greater than 0.02.
  • AvgNumTrans001: The same as above, except that the probability threshold is 0.001

These features require a resource, namely the IBM1 scores calculated from a large enough parallel corpus. Currently a pre-calculated word level scores (word pair translation probabilities) are loaded to produce sentence level scores.

Updated