- changed title to HTML Filter: RTL (Arabic) texts are not encoded to UTF-8 in the merged file
- edited description
HTML Filter: RTL (Arabic) texts are not encoded to UTF-8 in the merged file
Is this a right thing on Okapi or HTML in general?
I don’t have deep knowledge about RTL in HTML anyway
Comments (6)
-
reporter -
First of all, if you open the document in a browser, you’ll see that the Arabic renders fine. Numeric entity escaping is a valid notation.
The escaping is happening because somehow the meta charset tag in your file switched to US-ASCII. I am not sure what did this, but it is not the default behavior of the HTML filter.
Using the attached XLIFF with dummy (machine translation) Arabic targets, I can merge a target file using tikal and the default html config that is in UTF-8.
-
reporter is this because the custom html config (although it’s highly unlikely)?
please have a look at the custom config -
reporter - attached okf_htmlcustom_html.fprm
-
I don’t think so. I merged using your FPRM using tikal and it was still got the same result. However, I can see the same result if I run tikal with
-oe US-ASCII
to force the output encoding. How are you calling Okapi? -
reporter - changed status to resolved
- Log in to comment