XLIFF Filter: subfiltered trans-unit targets overwrite sources on merge

Issue #1002 resolved
Denis Konovalyenko created an issue

Input:

<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns:of="http://okapiframework.org" xmlns:ofe="okapi-framework:xliff-extensions"
       xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:m="http://www.memsource.com/mxlf/2.0"
       version="1.2" m:version="2.3" m:level="1">
    <file source-language="en" target-language="fr" datatype="x-test" original="file.ext">
        <header>
            <m:in-ctx-preview-skel><![CDATA[<p>CDATA in header</p>]]></m:in-ctx-preview-skel>
        </header>
        <body>
            <trans-unit id="1">
                <source><![CDATA[<p>CDATA in source & &amp;</p>]]></source>
                <target xml:lang="fr"></target>
            </trans-unit>
        </body>
    </file>
</xliff>

Extracted:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-fr
amework:xliff-extensions" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsxlf="http
://www.w3.org/ns/its-xliff/" its:version="2.0">
<file original="file.ext" source-language="en" target-language="fr" datatype="x-test">
<body>
<group id="1_ssf1" resname="sub-filter:null">
<trans-unit id="1_sf1_tu1" resname="null_1" restype="x-paragraph">
<source xml:lang="en">CDATA in source &amp; &amp;</source>
<target xml:lang="fr">CDATA in source &amp; &amp;</target>
</trans-unit>
</group>
<trans-unit id="1">
<source xml:lang="en"><x id="1"/></source>
<target xml:lang="fr"><x id="1"/></target>
</trans-unit>
</body>
</file>
</xliff>

Adjusted:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-fr
amework:xliff-extensions" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsxlf="http
://www.w3.org/ns/its-xliff/" its:version="2.0">
<file original="file.ext" source-language="en" target-language="fr" datatype="x-test">
<body>
<group id="1_ssf1" resname="sub-filter:null">
<trans-unit id="1_sf1_tu1" resname="null_1" restype="x-paragraph">
<source xml:lang="en">CDATA in source &amp; &amp;</source>
<target xml:lang="fr">Translated CDATA</target>
</trans-unit>
</group>
<trans-unit id="1">
<source xml:lang="en"><x id="1"/></source>
<target xml:lang="fr"><x id="1"/></target>
</trans-unit>
</body>
</file>
</xliff>

Merged:

<?xml version="1.0" encoding="UTF-8"?>
<xliff m:level="1" m:version="2.3" version="1.2"
       xmlns="urn:oasis:names:tc:xliff:document:1.2"
       xmlns:m="http://www.memsource.com/mxlf/2.0"
       xmlns:of="http://okapiframework.org" xmlns:ofe="okapi-framework:xliff-extensions">
    <file datatype="x-test" original="file.ext" source-language="en" target-language="fr">
        <header>
            <m:in-ctx-preview-skel><![CDATA[<p>CDATA in header</p>]]></m:in-ctx-preview-skel>
        </header>
        <body>
            <trans-unit id="1">
                <source>&lt;p&gt;Translated CDATA&lt;/p&gt;</source>
                <target xml:lang="fr">&lt;p&gt;Translated CDATA&lt;/p&gt;</target>
            </trans-unit>
        </body>
    </file>
</xliff>

Expected:

<?xml version="1.0" encoding="UTF-8"?>
<xliff m:level="1" m:version="2.3" version="1.2"
       xmlns="urn:oasis:names:tc:xliff:document:1.2"
       xmlns:m="http://www.memsource.com/mxlf/2.0"
       xmlns:of="http://okapiframework.org" xmlns:ofe="okapi-framework:xliff-extensions">
    <file datatype="x-test" original="file.ext" source-language="en" target-language="fr">
        <header>
            <m:in-ctx-preview-skel><![CDATA[<p>CDATA in header</p>]]></m:in-ctx-preview-skel>
        </header>
        <body>
            <trans-unit id="1">
                <source>&lt;p&gt;CDATA in source &amp;amp; &amp;amp;&lt;/p&gt;</source>
                <target xml:lang="fr">&lt;p&gt;Translated CDATA&lt;/p&gt;</target>
            </trans-unit>
        </body>
    </file>
</xliff>

Comments (6)

  1. jhargrave-straker
    • changed status to open

    I understand the need for this fix for multiple targets. But it has introduced an instability in the subfilter merger code that the aaron_messageformat_clean branch reveals. We get duplicated TU's. I think it has to do with multiple layers of references TU->Code->GROUP

    I’m also very concerned about the method of copying the source to the target TextContainer that was added in processTextUnit. We need to be careful to allow empty targets - if this is done with a filter option that’s OK for now.

  2. Log in to comment