Merging SDLXLIFF file results in some codes not being converted back to match original file.

Issue #985 new
Joseph Hovik created an issue

Original SDLXLIFF source:

<g id="31"><mrk mtype="seg" mid="10">turn<g id="32"> </g><g id="33">data</g><g id="34"> </g>into <g id="35">predictions</g></mrk></g>

XLIFF 1.2 sent to Okapi’s OriginalDocumentXliffMergerStep:

<it ctype="x-bold;" id="31" pos="open">&lt;cf size="36" font="Montserrat" bold="true" nfa="true"&gt;</it><mrk mid="10" mtype="seg">turn<bpt ctype="x-empty" id="32">&lt;cf nfa="true"&gt;</bpt> <ept id="32">&lt;/cf&gt;</ept><bpt ctype="x-empty" id="33">&lt;cf nfa="true"&gt;</bpt>data<ept id="33">&lt;/cf&gt;</ept><bpt ctype="x-empty" id="34">&lt;cf nfa="true"&gt;</bpt> <ept id="34">&lt;/cf&gt;</ept>into <bpt ctype="x-empty" id="35">&lt;cf nfa="true"&gt;</bpt>predictions<ept id="35">&lt;/cf&gt;</ept></mrk><it ctype="x-bold;" id="31" pos="close">&lt;/cf&gt;</it>

Expected sdlxliff output <target> section:

<g id="31">
    <mrk mid="10" mtype="seg">
        turn
        <g id="32"></g>
        <g id="33">data</g>
        <g id="34"></g>
        into
        <g id="35">predictions</g>
    </mrk>
</g>

Actual output file <target> section:

<g id="31"><mrk mid="10" mtype="seg">turn<bpt ctype="x-empty" id="32">&lt;cf nfa="true"></bpt></g><bpt ctype="x-empty" id="33">&lt;cf nfa="true"></bpt>data<ept id="33">&lt;/cf></ept><bpt ctype="x-empty" id="34">&lt;cf nfa="true"></bpt><ept id="34">&lt;/cf></ept>ito<bpt ctype="x-empty" id="35">&lt;cf nfa="true"></bpt>predictions<ept id="35">&lt;/cf></ept></mrk><it ctype="x-bold;" id="31" pos="close">&lt;/cf></it>

Not all the tags are changed back to <g> tags. Also the an end tag is missing resulting invalid XML.

Debugging through the code, I found the following:

  1. In OriginalDocumentXliffMergerStep.java, in the produce() method, on the line skelMergerWriter.handleEvent(xlfFilter.next());, we handle a "Text Unit" event. At this point in the code, the targets->parts->text->codes array objects don't have the same outerData object as the source parts->text->codes outerData object. Target: “<bpt ctype="x-empty" id="32"><cf nfa="true"></bpt>” Source: “<g id="32"></g>”.
  2. When the text unit is written, in GenericSkeletonWriter.java, in the expandCodeContent() method, the line String codeTmp = code.getOuterData(); equals “<g id="32">” for the source and “<bpt ctype="x-empty" id="32"><cf nfa="true"></bpt>” for the target. And so the actual output of invalid XML is written out to the file .

Comments (8)

  1. Jim Hargrave (OLD)

    Here is a small file to reproduce the problem. Note that the whitespace of the source and seg-source is different - normally this triggers an error in the xliff filter. but there are many parameters that may change this.

  2. Log in to comment