- changed title to #985 Merging SDLXLIFF file results in some tags being converted into BPT with no EPT.
Merging SDLXLIFF file results in some codes not being converted back to match original file.
Issue #985
new
Original SDLXLIFF source:
<g id="31"><mrk mtype="seg" mid="10">turn<g id="32"> </g><g id="33">data</g><g id="34"> </g>into <g id="35">predictions</g></mrk></g>
XLIFF 1.2 sent to Okapi’s OriginalDocumentXliffMergerStep:
<it ctype="x-bold;" id="31" pos="open"><cf size="36" font="Montserrat" bold="true" nfa="true"></it><mrk mid="10" mtype="seg">turn<bpt ctype="x-empty" id="32"><cf nfa="true"></bpt> <ept id="32"></cf></ept><bpt ctype="x-empty" id="33"><cf nfa="true"></bpt>data<ept id="33"></cf></ept><bpt ctype="x-empty" id="34"><cf nfa="true"></bpt> <ept id="34"></cf></ept>into <bpt ctype="x-empty" id="35"><cf nfa="true"></bpt>predictions<ept id="35"></cf></ept></mrk><it ctype="x-bold;" id="31" pos="close"></cf></it>
Expected sdlxliff output <target> section:
<g id="31">
<mrk mid="10" mtype="seg">
turn
<g id="32"></g>
<g id="33">data</g>
<g id="34"></g>
into
<g id="35">predictions</g>
</mrk>
</g>
Actual output file <target> section:
<g id="31"><mrk mid="10" mtype="seg">turn<bpt ctype="x-empty" id="32"><cf nfa="true"></bpt></g><bpt ctype="x-empty" id="33"><cf nfa="true"></bpt>data<ept id="33"></cf></ept><bpt ctype="x-empty" id="34"><cf nfa="true"></bpt><ept id="34"></cf></ept>ito<bpt ctype="x-empty" id="35"><cf nfa="true"></bpt>predictions<ept id="35"></cf></ept></mrk><it ctype="x-bold;" id="31" pos="close"></cf></it>
Not all the tags are changed back to <g> tags. Also the an end tag is missing resulting invalid XML.
Debugging through the code, I found the following:
- In OriginalDocumentXliffMergerStep.java, in the produce() method, on the line skelMergerWriter.handleEvent(xlfFilter.next());, we handle a "Text Unit" event. At this point in the code, the targets->parts->text->codes array objects don't have the same outerData object as the source parts->text->codes outerData object. Target: “<bpt ctype="x-empty" id="32"><cf nfa="true"></bpt>” Source: “<g id="32"></g>”.
- When the text unit is written, in GenericSkeletonWriter.java, in the expandCodeContent() method, the line String codeTmp = code.getOuterData(); equals “<g id="32">” for the source and “<bpt ctype="x-empty" id="32"><cf nfa="true"></bpt>” for the target. And so the actual output of invalid XML is written out to the file .
Comments (8)
-
reporter -
reporter -
reporter - edited description
-
reporter - edited description
-
reporter - changed title to Merging SDLXLIFF file results in some codes not being converted back to match original file.
-
reporter - edited description
-
- edited description
-
- attached tag_merge_error.sdlxliff
Here is a small file to reproduce the problem. Note that the whitespace of the source and seg-source is different - normally this triggers an error in the xliff filter. but there are many parameters that may change this.
- Log in to comment