- edited description
Merging XLIFF2 file results in some target segments being left out
File sent to Okapi to merge:
<?xml version="1.0" encoding="UTF-8"?><xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsxlf="http://www.w3.org/ns/its-xliff/" xmlns:okp="okapi-framework:xliff-extensions" its:version="2.0" version="1.2">
<file datatype="x-undefined" okp:configId="/filterconfiguration.fprm" okp:inputEncoding="UTF-8" original="unknown" source-language="en-US" target-language="de-DE">
<body>
<trans-unit id="3">
<source>Want a quiet mind? Move your body.</source>
<seg-source><mrk mid="0" mtype="seg">Want a quiet mind?</mrk> <mrk mid="2" mtype="seg">Move your body.</mrk></seg-source>
<target><mrk mid="0" mtype="seg">Möchten Sie einen ruhigen Geist?</mrk> <mrk mid="2" mtype="seg">Bewegen Sie Ihren Körper.</mrk></target>
<note annotates="general" priority="1">3.0</note>
<note annotates="general" priority="1"/>
</trans-unit>
</body>
</file>
</xliff>
Expected file from merging:
<?xml version="1.0"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en-US" trgLang="de-DE">
<file id="f1">
<unit id="3">
<notes>
<note category="key">3.0</note>
<note category="description"></note>
</notes>
<segment>
<source xml:space="preserve">Want a quiet mind? Move your body.</source>
<target xml:space="preserve">Möchten Sie einen ruhigen Geist? Bewegen Sie Ihren Körper.</target>
</segment>
</unit>
</file>
</xliff>
Actual file from merging:
<?xml version="1.0"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en-US" trgLang="de-DE">
<file id="f1">
<unit id="3">
<notes>
<note category="key">3.0</note>
<note category="description"></note>
</notes>
<segment>
<source xml:space="preserve">Want a quiet mind? Move your body.</source>
<target xml:space="preserve">Möchten Sie einen ruhigen Geist?</target>
</segment>
</unit>
</file>
</xliff>
Notice that “Bewegen Sie Ihren Körper.
“ is missing from the actual file.
I think I may have found the problem. It's in the XLIFF2OkpToX2Converter.java class.
In the “private List<Event> textUnit(ITextUnit okapiTextUnit, LocaleId targetLocale)” method, the “okapiTextUnit” variable has the correct targets:
{LocaleId@5944} "de-DE" -> {TextContainer@5965} "Möchten Sie einen ruhigen Geist? Bewegen Sie Ihren Körper."
But when the “textUnit()” method is returned from in XLIFF2FilterWriter.java, within the “xliff2Event” object, within the “parts” array, the target is missing the second segment:
“Möchten Sie einen ruhigen Geist?”
Then in the XLIFFWriter.java class, in the “writeUnit()” method, when the unit is written, only the first target segment is written.
I’ve attached the following files: original-file.xlf, to-merge, from-merge, and okf_xliff2@resegment_xliff2.fprm.
Comments (7)
-
reporter -
reporter - edited description
-
reporter - edited description
- attached okf_xliff2resegment_xliff2.fprm
- attached original-file.xlf
-
reporter - attached to-merge
- attached from-merge
-
reporter - edited description
-
The issue here is that the xliff 2 filter applied “segmentation deepening“. During the merge the segmentation is not applied as this was part of the pipeline and we end up with a different number of segments. We do log an error but continue processing. The output is basically truncated.
The xliff 2 fprm looks like this
v1 maxValidation.b=true mergeAsParagraph.b=false needsSegmentation.b=true
Any xliff 2 unit with canResegment=”yes” is sent to the segmenter and TextUnit adjusted with new segments.
Any bilingual xliff 2 is obviously not going to work in all cases as we have no way to align the source/target segments after segmentation. We should add checks for this case and shouldn’t proceed if segment counts differ.
-
- changed status to resolved
Fixed with recent xliff2 filter refactor
- Log in to comment