OpenXML: brackets are reversed in Arabic PPTX
Hi Okaip team,
It seems there is a regression when it comes to brackets and some punctuation marks. For some reason, the new updated OpenXML filter is now reversing most brackets. I understand that reversal could happen when there is mixed Arabic and English, but right now, even within a complete Arabic string, this can happen.
Please see attached screen shot.
Comments (31)
-
reporter -
@mzeid Could you please attach a sample file where this happens? (both source file in English and target file in Arabic). شكرا
-
Related to ticket 933?
-
reporter Hi @Manuel Souto Pico ,
Please find a sample PPTX file attached. In the first slide, I put English. The second slide contains the corresponding Arabic. It seems all types of brackets get wrongly reversed.
https://www.mediafire.com/file/rzysokp7xzjrlnw/presentation2_reversed_brackets.pptx/file
I hope this helps.
Thanks,
Mohamed
-
reporter Hi @Manuel Souto Pico ,
I think this is related to the same issue here.
As you can see in the screen shot, when there is English text between Arabic, the space that goes after the English word, gets misplaced.
Also, when a sentence ends with an English word, the period gets reversed and misplaced.
Are these related to the same issue here?
I would be grateful if you let me know once you have a fix. I can help with testing.
Thanks for your support.
-
@mzeid thanks for reporting!
Do you think you would be able to add example documents one more time to this ticket (they are not available on Mediafire)?
I assume this is related to runs merging on extraction when two consequential texts with the same styles are combined into one… This improves the segmentation quality.
I think a solution might be the distinguishing of punctuation marks (need to decide which) at the time of merging and writing them as separate runs…
@Manuel Souto Pico , it looks like this is a bit different from issue #933…
-
reporter Hi @Denis Konovalenko ,
I am so happy to hear from you. I hope all is well with you and loved ones.
I have uploaded a new version. Please download it from here.
https://www.mediafire.com/file/j82vcamab4to97k/presentation2_reversed_brackets.pptx/file
I hope you will have the time to fix this show-stopper.
Thanks again!
Mohamed
-
- attached presentation2_reversed_brackets.pptx
-
@mzeid thank you for your openness and support!
I have added your document to this issue.
-
Hi there,
I think my reply comes a bit late, but here it comes in case it’s useful (I haven’t read the whole thread).
This is a common issue when mixing Arabic text with Latin text or digits or punctuation or tags or all of them at the same time. To get text directionality correctly, normally you use Unicode bidirectionality markers and embeddings. They were added to OmegaT some years ago (sponsored by my company), but other CAT tools should have something similar.
Cheers, Manuel
-
reporter Hi @Manuel Souto Pico
Thanks for your reply.
I am not sure about the claim above. If you look at the first screenshot above (left side), you will see that Okapi reverses brackets even when it is only Arabic text which shouldn’t be the case. I believe there is something wrong with the implemented logic in the latest fix that is now causing this regression.
In SDL Trados Studio, it is handled correctly as you can see in the screenshot below (PPT file saved from within Trados Studio).
-
reporter Hi @Denis Konovalyenko
Hope all is well with you.
Is there any upate on this show-stopper issue?
Thanks
-
Dear @mzeid : I would like to have a look at the source (original) file (which I think is in English, assuming this is an English to Arabic translation), I think I never got that. Could you please share it?
-
reporter Dear @Manuel Souto Pico ,
Thanks for your kind follow-up. The issues appears in any PPTX files translated into Arabic. Denis uploaded the sample file above, but here is another file, so you can test it at your end.
https://1drv.ms/p/s!AoXUd1M3NEtkiOIyvYs_qeuZ5t9LrA?e=DREfhQ
I hope you can figure this one out, and have a solution for this show-stopper, Manuel. Thanks a million again for your support on this. Please let me know if you need any help with testing, or more examples.
Thanks,
Mohamed
-
Thanks, Mohamed. This seems to be a different file. Could you please share a screenshot of the part of the target document that shows the problem? (and please quote what the source text, so that I can easily find it). Thanks.
-
reporter Dear @Manuel Souto Pico ,
It will be visible in any file translated in Arabic unfortunately. I used Rianbow ver. 1.42. I tried the latest version, but it keeps throwing a Java INI error on my Windows 10 machine.
Please download the translated Arabic file.
https://1drv.ms/p/s!AoXUd1M3NEtkiORL9tJptFl_huO9Rw?e=8xZ9UW
Here are also some screenshots of the issues, just in case.
The issue is not only visilble when there is mix between English and Arabic, but also with pure Arabic text as shown in this screen shot.
As you can see, this is really a show-stopper.
It seems a Unicode control character is inserted around brackets, numbers, etc, and it causes this reversing effect in Arabic.
Thanks again for your support.
Please let me know if you have any further questions.
Thanks
-
reporter One more note. I used one of your MT Connectors to machine-translate that PPTX file.
-
- attached Study_USA.pptx
Another input document.
-
- attached Study_USA.out.pptx
And the output one.
-
@mzeid it looks like the root cause of this issue is connected with the
lang
attribute in the run properties.So, adding either
lang-ar-AE
toa:defRPr
ora:rPr
should fix this case.Below are the differences between the original and the translated 7th slide.
I will add a reduced document for the round-trip testing in my next message.
-
- attached 1127-1.pptx
-
The UI:
-
reporter Hi @Denis Konovalyenko , this is amzing news. Thanks a million. I looked at the sample PPTX file, and it looks perfect. Any chance you translate the Study_USA.pptx file using this fix to confirm. I can review it. This is very promising.
Any rough date when this fix will be available in Okapi?
Thanks again, guys. This is great.
-
reporter Hi @Denis Konovalyenko ,
If you can push this fix to Okapi Rainbow, I can test it on the same PPTX file, and let you know the final results. What do you think?
Thanks,
Mohamed
-
@mzeid , thank you for your patience! I hope to get to this next month - I will write a message when a pull request is ready.
-
A related pull request #664 was opened.
-
I will test this in the filters plugin for OmegaT as soon as it’s included, I suppose in version 1.13-1.45.0… Thanks a lot, Denis!
-
- changed milestone to 1.45.0
-
assigned issue to
-
reporter Hi @Denis Konovalyenko , I am happy to see there is progress
I would be grateful if you let me know in this issue when an updated version of Okapi rainbow is ready to test.
-
Pull request #664 was merged.
@mzeid the snapshots should be available soon I believe. I would appreciate hearing your feedback.
-
reporter Hi @Denis Konovalyenko , any update on this? Thanks!
- Log in to comment
Hi @Denis Konovalyenko ,
Here is the new issue as agreed.
Thanks,
Mohamed