Word count causes Index out of Bounds for French or Italy locales if contains apostrophe at end the word
Issue #1135
new
If the segment text contains an apostrophe at the end of word the method below will cause error. The tokenization process should check if the token contains only apostrophe before to split the token to avoid this error.
public List<Token> apostrophe(Token token, LocaleId locale) { List<Token> tokens; Matcher matcher = APOSTROPHE.matcher(token.getValue()); matcher.find(); int s = token.getRange().start; int e = token.getRange().end; tokens = new ArrayList<>(); String[] words = APOSTROPHE.split(token.getValue()); String value = words[0]; // Index out of Bounds here!! String name = Tokens.getTokenName(token.getId());
Locale: it-it
Segment example: questo word' un esempio per causare errore
Comments (2)
-
-
- changed milestone to 1.44.0
-
assigned issue to
- Log in to comment
Duplicated in #1136