BRAT tagging not working after encountering punctuation

Issue #13 resolved
Tomas Rojo Hernandez created an issue

Seems like we've broken something when moving to the new tokenizer. Now, as soon as the description contains punctuation, all tags after that have the character range off.

Example:

Original description: "Blue and multicolor Creatures of Comfort Natasha sleeveless shirt dress with collar, dual pockets at bust, belt at waist, plaid print throughout and button closures at front."

BRAT tags: T1 Color 0 4 Blue T2 Color 9 19 multicolor T3 Occasion 33 40 Comfort T4 Sleeve_Type 49 59 sleeveless T5 Style 60 71 shirt dress T6 Neckline 77 83 collar T7 Embellishment 90 97 pockets T8 Style 107 111 belt T9 Pattern 122 127 plaid T10 Pattern 128 133 print

Everything after collar is off. That is because if you look at the word 'pockets', it does appear in the original sentence in the character range 90-97, but since what we are writing to a .txt is the sentence with no punctuation, then there it would correspond to character range 89-96.

Comments (2)

  1. Log in to comment