MIF Filter: expose content in the correct document order

Issue #912 new
Denis Konovalyenko created an issue

Currently, the extractable content is exposed without its context. Anchored frames and tables appear as separate blocks earlier than any related paragraphs with references to them can be found. Besides that, the text lines content is missed.

Example 1:

Actual extraction order:

<trans-unit id="1" xml:space="preserve">
<source xml:lang="en">Table <x id="2"/>:</source>
<target xml:lang="fr">Table <x id="2"/>:</target>
</trans-unit>
<group id="1" restype="table">
<trans-unit id="2" xml:space="preserve">
<source xml:lang="en">on AFrame 1 > text frame 1.</source>
<target xml:lang="fr">on AFrame 1 > text frame 1.</target>
</trans-unit>
<group id="2" restype="row">
<group id="3" restype="cell">
<trans-unit id="3" xml:space="preserve">
<source xml:lang="en">h01</source>
<target xml:lang="fr">h01</target>
</trans-unit>
</group>
<group id="5" restype="cell">
<trans-unit id="4" xml:space="preserve">
<source xml:lang="en">h02</source>
<target xml:lang="fr">h02</target>
</trans-unit>
</group>
</group>
<group id="8" restype="row">
<group id="9" restype="cell">
<trans-unit id="5" xml:space="preserve">
<source xml:lang="en">c01</source>
<target xml:lang="fr">c01</target>
</trans-unit>
</group>
<group id="11" restype="cell">
<trans-unit id="6" xml:space="preserve">
<source xml:lang="en">c02</source>
<target xml:lang="fr">c02</target>
</trans-unit>
</group>
</group>
</group>
<group id="15" restype="table">
<trans-unit id="7" xml:space="preserve">
<source xml:lang="en">on AFrame 3 > text frame 1.</source>
<target xml:lang="fr">on AFrame 3 > text frame 1.</target>
</trans-unit>
<group id="16" restype="row">
<group id="17" restype="cell">
<trans-unit id="8" xml:space="preserve">
<source xml:lang="en">t01</source>
<target xml:lang="fr">t01</target>
</trans-unit>
</group>
<group id="19" restype="cell">
<trans-unit id="9" xml:space="preserve">
<source xml:lang="en">t02</source>
<target xml:lang="fr">t02</target>
</trans-unit>
</group>
</group>
</group>
<trans-unit id="10" xml:space="preserve">
<source xml:lang="en">Para on anchored frame 1 > text frame 1.</source>
<target xml:lang="fr">Para on anchored frame 1 > text frame 1.</target>
</trans-unit>
<trans-unit id="11" xml:space="preserve">
<source xml:lang="en">Para on anchored frame 1 > text frame 2.</source>
<target xml:lang="fr">Para on anchored frame 1 > text frame 2.</target>
</trans-unit>
<trans-unit id="12" xml:space="preserve">
<source xml:lang="en">Para on anchored frame 2 > text frame 1.</source>
<target xml:lang="fr">Para on anchored frame 2 > text frame 1.</target>
</trans-unit>
<trans-unit id="13" xml:space="preserve">
<source xml:lang="en">Para on anchored frame 3 > text frame 1.</source>
<target xml:lang="fr">Para on anchored frame 3 > text frame 1.</target>
</trans-unit>
<trans-unit id="14" xml:space="preserve">
<source xml:lang="en">Another para.</source>
<target xml:lang="fr">Another para.</target>
</trans-unit>
<trans-unit id="15" xml:space="preserve">
<source xml:lang="en">inside t02</source>
<target xml:lang="fr">inside t02</target>
</trans-unit>
<trans-unit id="16" xml:space="preserve">
<source xml:lang="en">Para 0.</source>
<target xml:lang="fr">Para 0.</target>
</trans-unit>

SDL editor output:

Example 2:

SDL editor output:

Example 3:

SDL editor output:

Example 4:

SDL editor output:

Example 5:

SDL editor output:

All related source documents can be found attached.

Comments (7)

  1. Denis Konovalyenko reporter
    • edited description

    Additional referential material has been added. Please bear in mind that text lines and text flows order can be enhanced if their shape rect values are considered. Another possibile improvement can be the LTR and RTL dependent order.

  2. Chase Tingley

    Anchored frames and tables appear as separate blocks earlier than any related paragraphs with references to them can be found.

    @Denis Konovalyenko A related problem that I think we also have is that if the reference to the frame/table appears inside a hidden paragraph, we will still expose the frame/table content for translation, even though it is not visible in Frame.

  3. Log in to comment