Word count causes Index out of Bounds for French or Italy locales if contains apostrophe at end the word

Issue #1135 new
Former user created an issue

If the segment text contains an apostrophe at the end of word the method below will cause error. The tokenization process should check if the token contains only apostrophe before to split the token to avoid this error.

public List<Token> apostrophe(Token token, LocaleId locale) {
        List<Token> tokens;
        Matcher matcher = APOSTROPHE.matcher(token.getValue());
        int s = token.getRange().start;
        int e = token.getRange().end;

        tokens = new ArrayList<>();
        String[] words = APOSTROPHE.split(token.getValue());

        String value = words[0]; // Index out of Bounds here!!
        String name = Tokens.getTokenName(token.getId());

Locale: it-it

Segment example: questo word' un esempio per causare errore

Comments (2)

  1. Log in to comment