OpenXML: brackets are reversed in Arabic PPTX

Issue #1127 new
mzeid created an issue

Hi Okaip team,

It seems there is a regression when it comes to brackets and some punctuation marks. For some reason, the new updated OpenXML filter is now reversing most brackets. I understand that reversal could happen when there is mixed Arabic and English, but right now, even within a complete Arabic string, this can happen.

Please see attached screen shot.

Comments (31)

  1. Manuel Souto Pico

    @mzeid Could you please attach a sample file where this happens? (both source file in English and target file in Arabic). شكرا

  2. mzeid reporter

    Hi @Manuel Souto Pico ,

    I think this is related to the same issue here.

    As you can see in the screen shot, when there is English text between Arabic, the space that goes after the English word, gets misplaced.

    Also, when a sentence ends with an English word, the period gets reversed and misplaced.

    Are these related to the same issue here?

    I would be grateful if you let me know once you have a fix. I can help with testing.

    Thanks for your support.

  3. Denis Konovalyenko

    @mzeid thanks for reporting!

    Do you think you would be able to add example documents one more time to this ticket (they are not available on Mediafire)?

    I assume this is related to runs merging on extraction when two consequential texts with the same styles are combined into one… This improves the segmentation quality.

    I think a solution might be the distinguishing of punctuation marks (need to decide which) at the time of merging and writing them as separate runs…

    @Manuel Souto Pico , it looks like this is a bit different from issue #933

  4. Denis Konovalyenko

    @mzeid thank you for your openness and support!

    I have added your document to this issue.

  5. Manuel Souto Pico

    Hi there,

    I think my reply comes a bit late, but here it comes in case it’s useful (I haven’t read the whole thread).

    This is a common issue when mixing Arabic text with Latin text or digits or punctuation or tags or all of them at the same time. To get text directionality correctly, normally you use Unicode bidirectionality markers and embeddings. They were added to OmegaT some years ago (sponsored by my company), but other CAT tools should have something similar.

    Cheers, Manuel

  6. mzeid reporter

    Hi @Manuel Souto Pico

    Thanks for your reply.

    I am not sure about the claim above. If you look at the first screenshot above (left side), you will see that Okapi reverses brackets even when it is only Arabic text which shouldn’t be the case. I believe there is something wrong with the implemented logic in the latest fix that is now causing this regression.

    In SDL Trados Studio, it is handled correctly as you can see in the screenshot below (PPT file saved from within Trados Studio).

  7. mzeid reporter

    Hi @Denis Konovalyenko

    Hope all is well with you.

    Is there any upate on this show-stopper issue?

    Thanks

  8. Manuel Souto Pico

    Dear @mzeid : I would like to have a look at the source (original) file (which I think is in English, assuming this is an English to Arabic translation), I think I never got that. Could you please share it?

  9. mzeid reporter

    Dear @Manuel Souto Pico ,

    Thanks for your kind follow-up. The issues appears in any PPTX files translated into Arabic. Denis uploaded the sample file above, but here is another file, so you can test it at your end.

    https://1drv.ms/p/s!AoXUd1M3NEtkiOIyvYs_qeuZ5t9LrA?e=DREfhQ

    I hope you can figure this one out, and have a solution for this show-stopper, Manuel. Thanks a million again for your support on this. Please let me know if you need any help with testing, or more examples.

    Thanks,

    Mohamed

  10. Manuel Souto Pico

    Thanks, Mohamed. This seems to be a different file. Could you please share a screenshot of the part of the target document that shows the problem? (and please quote what the source text, so that I can easily find it). Thanks.

  11. mzeid reporter

    Dear @Manuel Souto Pico ,

    It will be visible in any file translated in Arabic unfortunately. I used Rianbow ver. 1.42. I tried the latest version, but it keeps throwing a Java INI error on my Windows 10 machine.

    Please download the translated Arabic file.

    https://1drv.ms/p/s!AoXUd1M3NEtkiORL9tJptFl_huO9Rw?e=8xZ9UW

    Here are also some screenshots of the issues, just in case.

    The issue is not only visilble when there is mix between English and Arabic, but also with pure Arabic text as shown in this screen shot.

    As you can see, this is really a show-stopper.

    It seems a Unicode control character is inserted around brackets, numbers, etc, and it causes this reversing effect in Arabic.

    Thanks again for your support.

    Please let me know if you have any further questions.

    Thanks

  12. Denis Konovalyenko

    @mzeid it looks like the root cause of this issue is connected with the lang attribute in the run properties.

    So, adding either lang-ar-AE to a:defRPr or a:rPr should fix this case.

    Below are the differences between the original and the translated 7th slide.

    I will add a reduced document for the round-trip testing in my next message.

  13. mzeid reporter

    Hi @Denis Konovalyenko , this is amzing news. Thanks a million. I looked at the sample PPTX file, and it looks perfect. Any chance you translate the Study_USA.pptx file using this fix to confirm. I can review it. This is very promising.

    Any rough date when this fix will be available in Okapi?

    Thanks again, guys. This is great.

  14. mzeid reporter

    Hi @Denis Konovalyenko ,

    If you can push this fix to Okapi Rainbow, I can test it on the same PPTX file, and let you know the final results. What do you think?

    Thanks,

    Mohamed

  15. Denis Konovalyenko

    @mzeid , thank you for your patience! I hope to get to this next month - I will write a message when a pull request is ready.

  16. Manuel Souto Pico

    I will test this in the filters plugin for OmegaT as soon as it’s included, I suppose in version 1.13-1.45.0… Thanks a lot, Denis!

  17. mzeid reporter

    Hi @Denis Konovalyenko , I am happy to see there is progress 👍

    I would be grateful if you let me know in this issue when an updated version of Okapi rainbow is ready to test.

  18. Log in to comment