Tagging: Tags with ‘extra’ word in between.

Issue #2 new
Anton Kolechkin created an issue

This one might be tricky. It’s better understood with an example. Consider the phrase: “Raglan cap flutter sleeves”

All three adjectives, ‘Raglan’, ‘cap’, and ‘flutter’ are ‘Sleeve Types’.

We have both ‘Raglan’ and ‘Raglan sleeve’ in the dictionary so no problem there, ‘Raglan’ gets tagged.

There are many variations of ‘flutter sleeve’ in the dictionary, so that gets tagged as well. The problem comes with ‘cap’. We have both ‘cap sleeve’ and ‘capped sleeve’ in our dictionary, under ‘Sleeve_type’. But we also have a ‘caper’, a ‘capes’, a ‘caped’, a ‘capable’, and a ‘cape’ under ‘Style’ (all of which stem to ‘cap’).

In the example above, the ‘cap’ is being tagged as a ‘Style’, when it should be ‘Sleeve’. Could we possibly consider ‘skip-grams’, for example the bigram that is formed by deleting the middle word of a tri-gram? This would help in this case since one of the possible tags would be ‘cap sleeves’. However it’s going to be tough to tie it back to the original description… Let’s think about this!

Comments (2)

  1. Log in to comment