Wiki

Clone wiki

Questimate / POS Counts

These are sentence length like features calculated from POS tagged data. The sentence is first POS tagged. Then the tags are mapped to broad syntactic categories. The features are just the counts of words in the sentence, tagged as each of these categories:

  • VerbCount: Number of verbs in the sentence
  • NounCount: Number of nouns in the sentence
  • PrepositionCount: Number of prepositions in the sentence
  • PronounCount: Number of pronouns in the sentence
  • ModifierCount: Number of modifiers (adjectives, adverbs, quantifiers, qualifiers) in the sentence
  • NumberWordCount: Number of number words in the sentence
  • SymbolCount: Number of symbol words in the sentence
  • FuntionWordCount: Number of function words in the sentence (all the words not marked as any of the above are treated as function words)

Note that these features depend heavily on the tagger and the tagset. (Currently, the tagger used is TreeTagger). For example, there may not be a tag for symbol words in the tagset of some language. Similarly for number words.

Updated