XLIFF filter show pre translation message in different language when overwrite target language
We can see this bug from testOutputOverrideTargetlanguage test in XLIFFFilterTest.java
Reproduce the issue:
step1: Use the content:
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<xliff version=\"1.2\">\r
<file source-language=\"en\" target-language=\"fr\" datatype=\"x-test\" original=\"file.ext\">
\r<body>
<trans-unit id=\"1\">
<source xml:lang=\"en\">en message</source>
<target xml:lang=\"fr\">fr message</target>
</trans-unit>
<trans-unit id=\"2\">
<source xml:lang=\"en\">en message2</source>
<target>fr message2</target>
</trans-unit></body></file></xliff>
step2: Create a xliff filter but set filter.getParameters().setOverrideTargetLanguage(true).
step3: Use xliff to generateOutput for other language like "de"
Observe what we get:
<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<xliff version=\"1.2\">\r
<file source-language=\"en\" target-language=\"de\" datatype=\"x-test\" original=\"file.ext\">
\r<body>
<trans-unit id=\"1\">
<source xml:lang=\"en\">en message</source>
<target xml:lang=\"de\">fr message</target>
</trans-unit>
<trans-unit id=\"2\">
<source xml:lang=\"en\">en message2</source>
<target>fr message2</target>
</trans-unit></body></file></xliff>
You can see <target xml:lang=\"de\">fr message</target>.
The "de" target shouldn't contains "fr" message.
Comments (9)
-
-
What is the operation used to generate that output?
Extract? Merge? Do something in tikal?Just creating a filter with an input will not create an output.
Can you provide some steps that we can follow to reproduce this?Thank you,
Mihai -
Thanks for the refactor poor input!
Sorry to use “generate“, I was just copied the term in the test. To be more precise “Extract” could reproduce the issue.
I couldn’t find the way in tikal to overwrite XLIFF filter’s parameter to set “overrideTargetLanguage”, the default value is false so filter will always take the
target-language
in<file>
as document target language. To avoid this, I just remove thetarget0language
in<file>
.input.xlf:
<?xml version="1.0" encoding="UTF-8"?> <xliff version="1.2"> <file source-language="en" datatype="x-test" original="file.ext"> <body> <trans-unit id="1"> <source xml:lang="en">en message</source> <target xml:lang="fr">fr message</target> </trans-unit> <trans-unit id="2"> <source xml:lang="en">en message2</source> <target>fr message2</target> </trans-unit> </body> </file> </xliff>
Then use tikal to extract:
$./tikal.sh -x -tl de -od . input.xlf
And this is the output:
<?xml version="1.0" encoding="UTF-8"?> <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" xmlns:its="http://www.w3.org/2005/11/its" xmlns:itsxlf="http://www.w3.org/ns/its-xliff/" its:version="2.0"> <file original="file.ext" source-language="en" target-language="de" datatype="x-test" okp:inputEncoding="UTF-8"> <body> <trans-unit id="1"> <source xml:lang="en">en message</source> <target xml:lang="de">fr message</target> </trans-unit> <trans-unit id="2"> <source xml:lang="en">en message2</source> <target xml:lang="de">fr message2</target> </trans-unit> </body> </file> </xliff>
So my concern is “fr message“ should not appear under a <target> whose target-language is “de“.
Hope this can help describe the issue.
Thanks!
-
It looks like the root cause is that the
XLIFFFilter
“does not understand”TextUnit
(s) with multiple locales.It loads the text in the first
<target>
tag, ignores thexml:lang
even if present, and declares the target locale to be the file level one.
The file level target locale is either the one declared in<file>
target-language
attribute (withoutsetOverrideTargetLanguage
) or the one declared in the filter.See attached code that reproduces the problem.
The skeleton is also messed up if we set
setOverrideTargetLanguage
). I don’t know what that would do to a merge operation:===== *, setOverrideTargetLanguage(false) ===== setOverrideTargetLanguage(false) ===== skeleton : <trans-unit id="tu2" restype="x-paragraph"[#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef][#$$self$@%approved]> <source xml:lang="en"[#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>[#$$self$]</source> [@#$SEGSRC$#@]<target xml:lang="de"[#$$self$@%mtConfidence][#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>[#$$self$]</target> <target xml:lang="es"[#$$self$@%mtConfidence][#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>A second Spanish text (2).</target> <target xml:lang="fr"[#$$self$@%mtConfidence][#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>A second French text (3).</target> <target xml:lang="ja"[#$$self$@%mtConfidence][#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>A second Japanese text (4).</target> [@#$ALTTRANS$#@][@#$NOTE$#@] </trans-unit> ===== es, setOverrideTargetLanguage(true) ===== skeleton : <trans-unit id="tu2" restype="x-paragraph"[#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef][#$$self$@%approved]> <source xml:lang="en"[#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>[#$$self$]</source> [@#$SEGSRC$#@]<target xml:lang="es"[#$$self$@%mtConfidence][#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>[#$$self$]</target> <target xml:lang="es"[#$$self$@%mtConfidence][#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>A second Spanish text (2).</target> <target xml:lang="es"[#$$self$@%mtConfidence][#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>A second French text (3).</target> <target xml:lang="es"[#$$self$@%mtConfidence][#$$self$@%locQualityIssuesRef][#$$self$@%provenanceRecordsRef]>A second Japanese text (4).</target> [@#$ALTTRANS$#@][@#$NOTE$#@] </trans-unit>
The
XLIFFWriter
is also unable to write multilingualTextUnit
(s). Also see attached code.
I did not check to see whatXLIFFSkeletonWriter
does.
-
It does not look like a quick fix (something that can be done a week or two before a release :-)
But we can try do define what would be the desired behavior.
We have several “knobs”:- file level target locale (attribute
target-language
in<file>
). Optional. - the
RawDocument
target locale (propagated toXLIFFFilter
). Optional(?) setOverrideTargetLanguage
(if called / true / false).- the
xml:lang
attributes on<target>
. Can also be missing, so: Optional.
What do we expect o see in the
TextUnit
, and what do we expect to see in skeleton.
Step 2 would be to decide what the merge behavior would be.
- file level target locale (attribute
-
I agree that in general working with multilingual xliff files is messy (file management becomes a pain), and I am not aware of any company doing it.
I've seen cases where a client sends a file (partially) translated into one language (let’s say French), and wants X more languages (in separate files) (let's say Spanish + German)
A workaround for such cases would be to create separate projects: a French one with the original xliff, and a Spanish + German project with the original xliff and the target removed.So we can say: this is not supported, and leave it at that.
Although it is a bit disappointing if we don’t properly support the standard, at least at read / write level (even if we don’t promise that all steps are multilingual-aware)
-
- attached multilingual-xliff.zip
Small maven project showing that the XLIFFFilter does not support multiple targets.
-
I wanted to fix this, wrote some unit tests, and wanted to make sure that what I do is conform to the spec.
So:
- the <trans-unit> can only contain one target
”Zero or one<target>
element, followed by”
http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#trans-unit - The xml:lang in target MUST match the
target-language
in<file>
“The optionalxml:lang
attribute is used to specify the content language of the<target>
;
this should always matchtarget-language
as a child of<trans-unit>
but can vary as a child of<alt-trans>
”
http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#target
So my initial analysis of the problem is wrong. This is correct behavior:
“XLIFFFilter “does not understand” TextUnit(s) with multiple locales
I’ll look again and see if there is a problem with the merge.
- the <trans-unit> can only contain one target
-
I couldn’t find the way in tikal to overwrite XLIFF filter’s parameter to set “overrideTargetLanguage”, the default value is false so filter will always take the target-language in <file> as document target language.
To avoid this, I just remove the target-language in <file> .
In light of my fresh reading of the spec, this sounds suspicious.
Buy removing the
target-language
in<file>
it means the target language of the file becomes “Undefined”:
A language code as described in the [RFC 4646], the successor to [RFC 3066]
…
Default value: Undefined.
http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html#target-languageAnd “Undefined” does not really mean “it can be anything”, in RFC 4646 that is a real locale, with the language code
und
So the rule saying that the
xml:lang
in<target>
must be the same withtarget-language
means that thexml:lang="fr"
is invalid.
The only valid value would bexml:lang="und"
.And I think that by specifying
setOverrideTargetLanguage(true)
we are basically saying “ignore all the target locales specified in the file and override them with what I’m telling you”So “junk” (French text in a German target) is not that surprising.
I still have to think what a decent “error recovery” behavior should be.
- Log in to comment
Removed
\
in front of"
and some\r