Wiki
Clone wikiQuestimate / Word Translation or Association Features
These features are derived from parallel data, i.e., they are based on pairs of source and target sentences and they measure the degree of co-occurrence or association between two words, and based on that, give a score for the sentence pair.
The following such feature are currently implemented:
- IBM1Score: The co-occurrence based scores for a pair of sentences. This is just the multiplication of IBM1 scores of all possible source-target word pairs. Since this is an assymmetric scores, source to target as well as target to source version are calculated.
- AvgNumTrans02: Based on word level IBM1 scores, the average number of possible translations for the words in the source sentence which have IBM1 scores (translation probabilities) greater than 0.02.
- AvgNumTrans001: The same as above, except that the probability threshold is 0.001
These features require a resource, namely the IBM1 scores calculated from a large enough parallel corpus. Currently a pre-calculated word level scores (word pair translation probabilities) are loaded to produce sentence level scores.
Updated