OpenXML: RTL properties not correctly handled in PPTX

Issue #927 resolved
Former user created an issue

Hello Okapi team,

I translated a PPTX file from English to Arabic. The target Arabic file is not right aligned at all. I mean, it's still LTR, not RTL. Is this a known issue? Or did I miss something? I know that MateCat uses okapi filters, and the same issue is there. Is there any workaround for this? How can I generate a well-formatted RTL Arabic PPTX file?

That being said, if I convert the files to sdlxliff, translate it, then save target as from withing Trados Studio, the layout is right-aligned.

Please see screen shot.

Thanks, Mohamed

Comments (22)

  1. Denis Konovalyenko

    Mohamed, I am afraid this is something that has been slipped through the related development somehow. Well, it looks like the DOCX related BIDI properties are applied for PPTX documents but that does not make any effect on the way the document is rendered.

  2. Denis Konovalyenko

    There is a bit more on the matter.

    Columns Right-To-Left attribute

    rtlCol="1|0|true|false|on|off"

    specifies whether columns are used in a right-to-left or left-to-right order. The usage of this attribute only sets the column order that is used to determine which column overflow text should go to next. If this attribute is omitted, then a value of <0>, or off is implied in which case text will start in the leftmost column and flow to the right.

    It can be present in:

    1. <lnDef> (§5.1.4.1.20); <rich> (§5.7.2.157); <spDef> (§5.1.4.1.27); <t> (§5.9.3.8); <txBody> (§5.8.2.26); <txBody> (§5.1.2.1.40); <txBody> (§5.6.2.33); <txBody> (§4.4.1.47); <txDef> (§5.1.4.1.28); <txPr> (§5.7.2.217)/a:bodyPr

    Right To Left attribute

    rtl="1|0|true|false|on|off"

    specifies whether the table or text is right-to-left or left-to-right in its flow direction. If this attribute is omitted, then a value of <0>, or left-to-right is implied.

    It can be present in:

    1. p:presentation as Right-To-Left Views which specify if the current view of the user interface is oriented right-to-left or left-to-right. The view is right-to-left is this value is set to true, and left-to-right otherwise.

      2. <tbl> (§5.1.6.11)/a:tblPr

      3. <fld> (§5.1.5.2.4); <p> (§5.1.5.2.6)/<defPPr> (§5.1.5.2.2); <lvl1pPr> (§5.1.5.4.13); <lvl2pPr> (§5.1.5.4.14); <lvl3pPr> (§5.1.5.4.15); <lvl4pPr> (§5.1.5.4.16); <lvl5pPr> (§5.1.5.4.17); <lvl6pPr> (§5.1.5.4.18); <lvl7pPr> (§5.1.5.4.19); <lvl8pPr> (§5.1.5.4.20); <lvl9pPr> (§5.1.5.4.21); <pPr> (§5.1.5.2.7)

    Right To Left Text element

    <a:rtl val="1|0|true|false|on|off"/>

    specifies that the alignment and reading order for this run shall be right to left. This setting determines the way in which the run contents are presented in the document when punctuation characters are part of the run's contents. When this property is specified, each part of the run between a punctuation mark shall be laid out right to left on the line.

    It can be present in:

    1. <ctlPr> (§7.1.2.23); <r> (§7.1.2.87); <r> (§2.3.2.23)/a:rPr

    --

    So, the rtl (rtlCol) attribute is going to be “normalised” on extraction (removed) and added on merge if the target language is RTL. The following elements will be affected:

    1. p:presentation in the presentation part
    2. a:tblPr in slides, slide layouts and slide masters
    3. a:bodyPr in slides, slide layouts and slide masters
    4. a:defPPr, a:lvl[1-9]{1}pPr, a:pPr in slides, slide layouts and slide masters: rtl removed from all places but added to p:txStyles/p:titleStyle|p:bodyStyle|p:otherStyle/a:lvl[1-9]{1}pPr in slide masters only - any text should inherit the formatting of one of these styles.

    The a:rtl element is going to be removed only - it does not make any difference to UI even if it is present or not.

  3. Denis Konovalyenko

    In regard to “The target Arabic file is not right aligned at all”, it has been found out that MS PowerPoint does right alignment by specifying just align="r" for paragraphs that have been introduced as RTL. Thus, a pair of align="r" and rtl="1" will be added to the mentioned p:txStyles/p:titleStyle|p:bodyStyle|p:otherStyle/a:lvl[1-9]{1}pPr if the target language is RTL.

  4. Denis Konovalyenko

    A bit more details on the handling of alignment and RTL paragraph properties attributes behviour:

    alignment - left, rtl - false
    Ideal solution:

    1. on extraction: remove alignment, remove rtl
    2. on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: add alignment - right, add rtl - true
      source lang - RTL, target lang - LTR: do nothing
      source lang - RTL, target lang - RTL: do nothing
      Balanced solution:
      on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: add/change alignment - right, add/change rtl - true
      source lang - RTL, target lang - LTR: do nothing
      source lang - RTL, target lang - RTL: do nothing

    alignment - left, rtl - true
    Ideal solution:

    1. on extraction: remove alignment, keep rtl
    2. on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: do nothing
      source lang - RTL, target lang - LTR: add alignment - right, remove rtl
      source lang - RTL, target lang - RTL: do nothing
      Balanced solution:
      on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: do nothing
      source lang - RTL, target lang - LTR: add/change alignment - right, add/change rtl - false
      source lang - RTL, target lang - RTL: do nothing

    alignment - right, rtl - false
    Ideal solution:

    1. on extraction: keep alignment, remove rtl
    2. on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: remove alignment, add rtl - true
      source lang - RTL, target lang - LTR: do nothing
      source lang - RTL, target lang - RTL: do nothing
      Balanced solution:
      on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: add/change alignment - left, add/change rtl - true
      source lang - RTL, target lang - LTR: do nothing
      source lang - RTL, target lang - RTL: do nothing

    alignment - right, rtl - true
    Ideal solution:

    1. on extraction: keep alignment, keep rtl
    2. on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: do nothing
      source lang - RTL, target lang - LTR: remove alignment, remove rtl
      source lang - RTL, target lang - RTL: do nothing
      Balanced solution:
      on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: do nothing
      source lang - RTL, target lang - LTR: add/change alignment - left, add/change rtl - false
      source lang - RTL, target lang - RTL: do nothing

    alignment - neither left nor right, rtl - false
    Ideal solution:

    1. on extraction: keep alignment, remove rtl
    2. on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: keep alignment, add rtl - true
      source lang - RTL, target lang - LTR: do nothing
      source lang - RTL, target lang - RTL: do nothing
      Balanced solution:
      on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: keep alignment, add/change rtl - true
      source lang - RTL, target lang - LTR: do nothing
      source lang - RTL, target lang - RTL: do nothing

    alignment - neither left nor right, rtl - true
    Ideal solution:

    1. on extraction: keep alignment, keep rtl
    2. on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: do nothing
      source lang - RTL, target lang - LTR: keep alignment, remove rtl
      source lang - RTL, target lang - RTL: do nothing
      Balanced solution:
      on merge:
      source lang - LTR, target lang - LTR: do nothing
      source lang - LTR, target lang - RTL: do nothing
      source lang - RTL, target lang - LTR: keep alignment, add/change rtl - false
      source lang - RTL, target lang - RTL: do nothing

  5. Denis Konovalyenko

    Additionally, the following behaviour was considered for RTL paragraph and run properties:

    rtl - false
    source lang - LTR, target lang - LTR: do nothing
    source lang - LTR, target lang - RTL: add/change rtl - true
    source lang - RTL, target lang - LTR: do nothing
    source lang - RTL, target lang - RTL: do nothing
    rtl - true
    source lang - LTR, target lang - LTR: do nothing
    source lang - LTR, target lang - RTL: do nothing
    source lang - RTL, target lang - LTR: add/change rtl - false
    source lang - RTL, target lang - RTL: do nothing

  6. mzeid

    Hi all,

    @DenisKonovalyenko

    Thank you for working on this. I appreciate it. It seems this issue is resolved. If yes, can you please let me know how I can test it? Which release should I download to test if this is working for Arabic?

    Thanks,

    Mohamed

  7. Denis Konovalyenko

    Mohamed, thanks for getting in touch!

    The solution was merged into the dev branch and can be tested with the latest snapshot (it can be found here). Also, this solution will be available in the next ongoing release 1.42.0.

  8. mzeid

    Thanks a million @Denis Konovalyenko for letting me know. I am downloading the latest snapshot of Okapi rainbow and I will let you know if I have any comments.

  9. mzeid

    Hi @Denis Konovalyenko , I have just tested the latest snapshot and I confirm that the PPT looks much better now and all text is right-aligned. I haven’t done extensive testing though, but it looks great already. Thank you so much for fixing this issue.

  10. mzeid

    Hi @Denis Konovalyenko
    Please let me know if I should submit a new issue for this one. Since it is related to the same issue, I thought I should add it her for context.

    It seems there is a regression when it comes to brackets and some punctuation marks. For some reason, the new updated OpenXML filter is now reversing most brackets. I understand that reversal could happen when there is mixed Arabic and English, but right now, even within a complete Arabic string, this can happen.

    Please see attached screen shot.

    Any idea?

    Thanks

  11. Denis Konovalyenko

    @mzeid , thank you for your additional feedback on this!

    I think the reversal of brackets could be related to the presence of the rtl property… but that can be confirmed after doing some additional research only. Would it be possible for you to create a new issue for tracking the mentioned mixed (LTR and RTL) content inconsistency?

  12. mzeid

    Hi @Denis Konovalyenko

    Just wanted to share this with you, maybe it could help with your troubleshooting.

    I noticed that adding RLM marker at the end of English words, the issue seems to be fixed. Right now, the space that should go after the English word, gets reversed, so it looks like 2 spaces before the word, and no space after it.

    I hope this screen shot explain things. This issue is currently a show-stopper, and it requires extensive manual fixing.

    Thanks again for your support.

  13. Chase Tingley

    @mzeid Thanks for the additional info. Denis’s work on Okapi has been disrupted by the situation in the Ukraine. Hopefully he will have a chance to look into this more in the future.

  14. mzeid

    Hi @Chase Tingley Thanks for your reply. I hope Denis and his loved ones are all safe and well 😞

    Take care guys and keep up the good work!

    Thank you again!

  15. Log in to comment