xliff1.2 with g codes split by segments fails on merger...

Issue #1303 new
jhargrave-straker created an issue
<xliff xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 http://docs.oasis-open.org/xliff/v1.2/os/xliff-core-1.2-strict.xsd"
    xmlns="urn:oasis:names:tc:xliff:document:1.2"
    xmlns:xhtml="http://www.w3.org/1999/xhtml" version="1.2">
    <file original="course" datatype="plaintext" source-language="en-US" target-language="fr-FR">
        <body>
            <trans-unit id="description">
                <source>
                    <g id="Hk"><g id="mm">Sentence one.</g> Sentence two.</g>
                </source>
            </trans-unit>
        </body>
    </file>
</xliff>

This file will reproduce the issue in our integration tests.

Comments (2)

  1. jhargrave-straker reporter

    I found out why this works in m38 of Okapi. We used to use the Tikal XliffMergerStep and it had this code:

    // Do we need to preserve the segmentation for merging (e.g. TTX case)
    boolean mergeAsSegments = ((tuFromSkel.getMimeType() != null)
    && (tuFromSkel.getMimeType().equals(MimeTypeMapper.TTX_MIME_TYPE)));
    

    This looks like a bug and forces xliff to merge back without segmentation. If you don’t have extra mrk elements the g codes are well formed. Our new merger does the right thing and merges back with segmentation. However the XliffSkeletonWriter has a bug where it delegates content to GenericSkeletonWriter and creates malformed xml.

  2. Log in to comment