- edited description
Okapi validator incorrectly recognizes double words in several Indic languages
Issue #594
resolved
Steps to reproduce:
- Start checkmate
- Validate the attached file
The validation will recognize "ल ल" as double word, which are actually two characters from two neighboring words.
Similar issues happen for several other languages:
- Bengali (bn), e.g. string "চালাবেন না!" detects double words "ন ন"
- Gujarati (gu), e.g. string "લિંક કૉપિ" detects double words "ક ક"
- Kanada (kn), e.g. string "ಒಳಬರುವ ವೀಡಿಯೊ ಕರೆ" detects double words "ವ ವ"
- Marathi (mr), e.g. string "साइन इन करा" detects double words "इन इन"
- Nepali (ne), e.g. string "ड्राइभबाट टिम" detects double words "ट ट"
- Punjabi (pa), e.g. string "ਕਲਿੱਕ ਕੀਤੀ" detects double words "ਕ ਕ"
- Sinhalese (si), e.g. string "ඔබ ඔබේ" detects double words "ඔබ ඔබ"
Expected behavior: No double words issue should be reported.
Comments (3)
-
-
Okapi falsely recognize "ल" as one word because the character next to it (e.g. ॉ) is a "Mark", not a "Letter" in Unicode.
-
- changed status to resolved
Fix issue
#594: Okapi validator incorrectly recognizes double words in several Indic languages→ <<cset 9e9e9b8ecf7e>>
- Log in to comment