OpenXML: brackets are reversed in Arabic PPTX

Create issue
Issue #1127 new
mzeid created an issue

Hi Okaip team,

It seems there is a regression when it comes to brackets and some punctuation marks. For some reason, the new updated OpenXML filter is now reversing most brackets. I understand that reversal could happen when there is mixed Arabic and English, but right now, even within a complete Arabic string, this can happen.

Please see attached screen shot.

Comments (11)

  1. Manuel Souto Pico

    @mzeid Could you please attach a sample file where this happens? (both source file in English and target file in Arabic). شكرا

  2. mzeid reporter

    Hi @Manuel Souto Pico ,

    I think this is related to the same issue here.

    As you can see in the screen shot, when there is English text between Arabic, the space that goes after the English word, gets misplaced.

    Also, when a sentence ends with an English word, the period gets reversed and misplaced.

    Are these related to the same issue here?

    I would be grateful if you let me know once you have a fix. I can help with testing.

    Thanks for your support.

  3. Denis Konovalyenko

    @mzeid thanks for reporting!

    Do you think you would be able to add example documents one more time to this ticket (they are not available on Mediafire)?

    I assume this is related to runs merging on extraction when two consequential texts with the same styles are combined into one… This improves the segmentation quality.

    I think a solution might be the distinguishing of punctuation marks (need to decide which) at the time of merging and writing them as separate runs…

    @Manuel Souto Pico , it looks like this is a bit different from issue #933

  4. Denis Konovalyenko

    @mzeid thank you for your openness and support!

    I have added your document to this issue.

  5. Manuel Souto Pico

    Hi there,

    I think my reply comes a bit late, but here it comes in case it’s useful (I haven’t read the whole thread).

    This is a common issue when mixing Arabic text with Latin text or digits or punctuation or tags or all of them at the same time. To get text directionality correctly, normally you use Unicode bidirectionality markers and embeddings. They were added to OmegaT some years ago (sponsored by my company), but other CAT tools should have something similar.

    Cheers, Manuel

  6. mzeid reporter

    Hi @Manuel Souto Pico

    Thanks for your reply.

    I am not sure about the claim above. If you look at the first screenshot above (left side), you will see that Okapi reverses brackets even when it is only Arabic text which shouldn’t be the case. I believe there is something wrong with the implemented logic in the latest fix that is now causing this regression.

    In SDL Trados Studio, it is handled correctly as you can see in the screenshot below (PPT file saved from within Trados Studio).

  7. Log in to comment