Emoji disappears when converted from XLIF to DOCX
I have a document (.docx) that has emojis. After converting this document to the XLIF format, these emojis are present, but after back conversion that XLIF to the DOCX format, the emojis disappear. In this version: https://bitbucket.org/okapiframework/okapi/branch/xliff2-improvment everything works correctly. Is it possible to add changes from this branch to the master? The file with emojis has been attached. I would expect answers and thank you for the help!
Comments (12)
-
-
- removed milestone
-
I suspect XmlInputStreamReader may be the problem - this was working up till m33. This is one we would like to fix before the M36 release.
-
I like both that theory and the prioritization.
-
I have a feature branch that may fix this would appreciate a code review (branch feature/Issue_
#680). Note that that same branch I had to disable a markdown test because it was failing. -
I can take a look
-
@jimhargrave That code looks good to me and I can see it fixes Igor's original case. Do you want to open a formal PR, or can I just merge it? I'm going to add the markdown test back in, it's a Windows-only failure, I think. I will talk to Kuro about it.
-
I created a PR - no problem with the markdown test we can add that back in.
-
Yves would like to change the raw emoji characters to a Unicode escape sequence to protect against encoding differences across platforms.
-
- changed milestone to M36
-
- changed status to resolved
🎷🐈
-
@jhargrave Good idea, I can do it quick
- Log in to comment
Hi Igor, thanks for the report and the easy testcase. I can confirm this on 0.36-SNAPSHOT. The extracted XLIFF looks fine but we lose the emoji during merge.
The tikal output during merge is suspicious:
I have a suspicion we're not reading the emoji correctly when we parse the XLIFF back in, but that needs to be confirmed.
I'm clearing the milestone field (that's to indicate what version the fix occurs in).