OpenXML: RTL properties not correctly handled in PPTX
Hello Okapi team,
I translated a PPTX file from English to Arabic. The target Arabic file is not right aligned at all. I mean, it's still LTR, not RTL. Is this a known issue? Or did I miss something? I know that MateCat uses okapi filters, and the same issue is there. Is there any workaround for this? How can I generate a well-formatted RTL Arabic PPTX file?
That being said, if I convert the files to sdlxliff, translate it, then save target as from withing Trados Studio, the layout is right-aligned.
Please see screen shot.
Thanks, Mohamed
Comments (22)
-
-
- changed title to OpenXML: RTL properties not correctly handled in PPTX
-
Duplicate:
#957with additional information. -
There is a bit more on the matter.
Columns Right-To-Left attribute
rtlCol="1|0|true|false|on|off"
specifies whether columns are used in a right-to-left or left-to-right order. The usage of this attribute only sets the column order that is used to determine which column overflow text should go to next. If this attribute is omitted, then a value of <0>, or off is implied in which case text will start in the leftmost column and flow to the right.
It can be present in:
- <lnDef> (§5.1.4.1.20); <rich> (§5.7.2.157); <spDef> (§5.1.4.1.27); <t> (§5.9.3.8); <txBody> (§5.8.2.26); <txBody> (§5.1.2.1.40); <txBody> (§5.6.2.33); <txBody> (§4.4.1.47); <txDef> (§5.1.4.1.28); <txPr> (§5.7.2.217)/a:bodyPr
Right To Left attribute
rtl="1|0|true|false|on|off"
specifies whether the table or text is right-to-left or left-to-right in its flow direction. If this attribute is omitted, then a value of <0>, or left-to-right is implied.
It can be present in:
-
p:presentation as Right-To-Left Views which specify if the current view of the user interface is oriented right-to-left or left-to-right. The view is right-to-left is this value is set to true, and left-to-right otherwise.
2. <tbl> (§5.1.6.11)/a:tblPr
3. <fld> (§5.1.5.2.4); <p> (§5.1.5.2.6)/<defPPr> (§5.1.5.2.2); <lvl1pPr> (§5.1.5.4.13); <lvl2pPr> (§5.1.5.4.14); <lvl3pPr> (§5.1.5.4.15); <lvl4pPr> (§5.1.5.4.16); <lvl5pPr> (§5.1.5.4.17); <lvl6pPr> (§5.1.5.4.18); <lvl7pPr> (§5.1.5.4.19); <lvl8pPr> (§5.1.5.4.20); <lvl9pPr> (§5.1.5.4.21); <pPr> (§5.1.5.2.7)
Right To Left Text element
<a:rtl val="1|0|true|false|on|off"/>
specifies that the alignment and reading order for this run shall be right to left. This setting determines the way in which the run contents are presented in the document when punctuation characters are part of the run's contents. When this property is specified, each part of the run between a punctuation mark shall be laid out right to left on the line.
It can be present in:
- <ctlPr> (§7.1.2.23); <r> (§7.1.2.87); <r> (§2.3.2.23)/a:rPr
--
So, the
rtl
(rtlCol
) attribute is going to be “normalised” on extraction (removed) and added on merge if the target language is RTL. The following elements will be affected:p:presentation
in the presentation parta:tblPr
in slides, slide layouts and slide mastersa:bodyPr
in slides, slide layouts and slide mastersa:defPPr
,a:lvl[1-9]{1}pPr
,a:pPr
in slides, slide layouts and slide masters:rtl
removed from all places but added top:txStyles
/p:titleStyle
|p:bodyStyle
|p:otherStyle
/a:lvl[1-9]{1}pPr
in slide masters only - any text should inherit the formatting of one of these styles.
The
a:rtl
element is going to be removed only - it does not make any difference to UI even if it is present or not. -
- attached 927-tblpr-rtl.pptx
- attached 927-presentation-rtl-1.pptx
- attached 927-p-ppr-rtl.pptx
- attached 927-bodypr-rtlcol-1.pptx
These are the documents the screenshots were taken from.
-
In regard to “The target Arabic file is not right aligned at all”, it has been found out that MS PowerPoint does right alignment by specifying just
align="r"
for paragraphs that have been introduced as RTL. Thus, a pair ofalign="r"
andrtl="1"
will be added to the mentionedp:txStyles
/p:titleStyle
|p:bodyStyle
|p:otherStyle
/a:lvl[1-9]{1}pPr
if the target language is RTL. -
- changed milestone to 1.41.0
-
assigned issue to
-
A bit more details on the handling of alignment and RTL paragraph properties attributes behviour:
alignment - left, rtl - false
Ideal solution:- on extraction: remove alignment, remove rtl
- on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: add alignment - right, add rtl - true
source lang - RTL, target lang - LTR: do nothing
source lang - RTL, target lang - RTL: do nothing
Balanced solution:
on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: add/change alignment - right, add/change rtl - true
source lang - RTL, target lang - LTR: do nothing
source lang - RTL, target lang - RTL: do nothing
alignment - left, rtl - true
Ideal solution:- on extraction: remove alignment, keep rtl
- on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: do nothing
source lang - RTL, target lang - LTR: add alignment - right, remove rtl
source lang - RTL, target lang - RTL: do nothing
Balanced solution:
on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: do nothing
source lang - RTL, target lang - LTR: add/change alignment - right, add/change rtl - false
source lang - RTL, target lang - RTL: do nothing
alignment - right, rtl - false
Ideal solution:- on extraction: keep alignment, remove rtl
- on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: remove alignment, add rtl - true
source lang - RTL, target lang - LTR: do nothing
source lang - RTL, target lang - RTL: do nothing
Balanced solution:
on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: add/change alignment - left, add/change rtl - true
source lang - RTL, target lang - LTR: do nothing
source lang - RTL, target lang - RTL: do nothing
alignment - right, rtl - true
Ideal solution:- on extraction: keep alignment, keep rtl
- on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: do nothing
source lang - RTL, target lang - LTR: remove alignment, remove rtl
source lang - RTL, target lang - RTL: do nothing
Balanced solution:
on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: do nothing
source lang - RTL, target lang - LTR: add/change alignment - left, add/change rtl - false
source lang - RTL, target lang - RTL: do nothing
alignment - neither left nor right, rtl - false
Ideal solution:- on extraction: keep alignment, remove rtl
- on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: keep alignment, add rtl - true
source lang - RTL, target lang - LTR: do nothing
source lang - RTL, target lang - RTL: do nothing
Balanced solution:
on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: keep alignment, add/change rtl - true
source lang - RTL, target lang - LTR: do nothing
source lang - RTL, target lang - RTL: do nothing
alignment - neither left nor right, rtl - true
Ideal solution:- on extraction: keep alignment, keep rtl
- on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: do nothing
source lang - RTL, target lang - LTR: keep alignment, remove rtl
source lang - RTL, target lang - RTL: do nothing
Balanced solution:
on merge:
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: do nothing
source lang - RTL, target lang - LTR: keep alignment, add/change rtl - false
source lang - RTL, target lang - RTL: do nothing
-
Additionally, the following behaviour was considered for RTL paragraph and run properties:
rtl - false
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: add/change rtl - true
source lang - RTL, target lang - LTR: do nothing
source lang - RTL, target lang - RTL: do nothing
rtl - true
source lang - LTR, target lang - LTR: do nothing
source lang - LTR, target lang - RTL: do nothing
source lang - RTL, target lang - LTR: add/change rtl - false
source lang - RTL, target lang - RTL: do nothing -
A related pull request #492 was opened.
-
- changed milestone to 1.42.0
-
- changed status to resolved
The pull request #492 was merged.
-
Hi all,
@DenisKonovalyenko
Thank you for working on this. I appreciate it. It seems this issue is resolved. If yes, can you please let me know how I can test it? Which release should I download to test if this is working for Arabic?
Thanks,
Mohamed
-
Mohamed, thanks for getting in touch!
The solution was merged into the dev branch and can be tested with the latest snapshot (it can be found here). Also, this solution will be available in the next ongoing release 1.42.0.
-
Thanks a million @Denis Konovalyenko for letting me know. I am downloading the latest snapshot of Okapi rainbow and I will let you know if I have any comments.
-
Hi @Denis Konovalyenko , I have just tested the latest snapshot and I confirm that the PPT looks much better now and all text is right-aligned. I haven’t done extensive testing though, but it looks great already. Thank you so much for fixing this issue.
-
Hi @Denis Konovalyenko
Please let me know if I should submit a new issue for this one. Since it is related to the same issue, I thought I should add it her for context.It seems there is a regression when it comes to brackets and some punctuation marks. For some reason, the new updated OpenXML filter is now reversing most brackets. I understand that reversal could happen when there is mixed Arabic and English, but right now, even within a complete Arabic string, this can happen.
Please see attached screen shot.
Any idea?
Thanks
-
@mzeid , thank you for your additional feedback on this!
I think the reversal of brackets could be related to the presence of the
rtl
property… but that can be confirmed after doing some additional research only. Would it be possible for you to create a new issue for tracking the mentioned mixed (LTR and RTL) content inconsistency? -
You are welcome, @Denis Konovalyenko
I have created a new issue.
https://bitbucket.org/okapiframework/okapi/issues/1127/openxml-brackets-are-reversed-in-arabic
I may also submit more issues if any comes up.
I really appreciate your support.
Thanks,
Mohamed -
Hi @Denis Konovalyenko
Just wanted to share this with you, maybe it could help with your troubleshooting.
I noticed that adding RLM marker at the end of English words, the issue seems to be fixed. Right now, the space that should go after the English word, gets reversed, so it looks like 2 spaces before the word, and no space after it.
I hope this screen shot explain things. This issue is currently a show-stopper, and it requires extensive manual fixing.
Thanks again for your support.
-
@mzeid Thanks for the additional info. Denis’s work on Okapi has been disrupted by the situation in the Ukraine. Hopefully he will have a chance to look into this more in the future.
-
Hi @Chase Tingley Thanks for your reply. I hope Denis and his loved ones are all safe and well
Take care guys and keep up the good work!
Thank you again!
- Log in to comment
Mohamed, I am afraid this is something that has been slipped through the related development somehow. Well, it looks like the DOCX related BIDI properties are applied for PPTX documents but that does not make any effect on the way the document is rendered.