XLIFF filter - add option to consider the content of the <target> tag as source of the segment

Issue #563 wontfix
Nikolai Vladimirov created an issue

Some XLIFF files in the wild consider their <source> more like a translation key for internal uses.

Example:

<?xml version="1.0"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
    <file source-language="en" datatype="plaintext" original="file.ext">
        <body>
            <trans-unit id="1">
                <source>homepage.title</source>
                <target>Title - the best place for page titles on the internet.</target>
            </trans-unit>
        </body>
     </file>
 </xliff>

This can be fixed if the file is processed with a custom config of the XML or XMLStream filters or even with some external modification.

It would be great if we had the option to specify that the <target> is the significant entry when both <source> and <target> are present in the translation unit.

It's completely OK if this case is considered out of scope for the XLIFF plugin. But it would be quite nice to have the option.

Thank you for the consideration, no matter the decision on this.

Comments (13)

  1. Nikolai Vladimirov reporter

    @tingley the translation would be overwritten in the target tag.

    The exact behavior I'm actually looking for is what OmegaT (3.6.3) does when processing XLIFF files.

    When <target> is blank - it considers the <source> as translatable text for segmentation.

    When both <source> and <target> are present the text in the <target> is considered translatable. <source> remains unchanged and <target> is replaced with the translated text.

    In this case the Okapi XLIFF filter is more correct and compliant with the spec - always considering <source> to be the translatable entity and just prefilling the translation with the <target> text. However in this exact use case with clients misusing XLIFF the OmegaT behavior is better when dealing with XLIFF in certain cases.

  2. Chase Tingley

    Thanks for the clarification.

    When both <source> and <target> are present the text in the <target> is considered translatable. <source> remains unchanged and <target> is replaced with the translated text.

    The problem with this is that it breaks a number of very common use cases: post-editing, review, error correction, etc.

    As an option that could be manually enabled it is slightly more ok, but even then I'm still not sure I'm convinced. This is a pretty tortured use of XLIFF. A much better way to handle this would be to put the translation key in the resname attribute on the trans-unit:

    <trans-unit id="1" resname="homepage.title">
        <source>Title - the best place for page titles on the internet.</source>
        <target></target>
    </trans-unit>
    

    Do you know, is there a specific tool that is generating XLIFF in the way you've shown, with the resname in the source?

  3. Nikolai Vladimirov reporter

    Ok, after some further investigation, trying to figure out the tool that generated those files it turned out that this type of usage was exceptionally rare and rather old.

    So I'm just closing this, @tingley completely agree with all your points. I imagined this as an option, but realistically it's extremely rare and not worth the effort and possible breakage.

  4. Chase Tingley

    Thank you Nikolai, if you find tools that are still doing this, let me know so I can go yell at them :)

  5. Nikolai Vladimirov reporter

    ok, so Symfony actually uses XLIFF as translation storage file.

    So clients would usually get their english locale file, send it for translation and just rename it to the language it was translated to. Expecting that the <source> would remain unchanged and only <target> text is translated.

    This seems like a rather important use case now. I can create a new issue that is specifically for PHP Symfony XLIFF file support?

    A whole bunch of xliff files here: https://github.com/symfony/symfony-demo/tree/master/app/Resources/translations

    Unfortunately Symfony does not add vendor specific info to those files so no good way to figure out if special handling should be used or not.

  6. ysavourel

    I think the solution for this is to ask Symfony's developers to fix their XLIFF output. XLIFF is a standard format, and the specification on what must go inside <source> is very clear.

    If we adapt Okapi's filter, that will allow only a few tools to work with such files. If they fix their incorrect XLIFF, it will allow all tools to work with their corrected files.

    I think making Okapi work with such document would be doing a disservice to the localization community in general: it would encourage other tools generating XLIFF to do whatever they want and expect CAT tools to work-around their specific "flavor".

  7. Nikolai Vladimirov reporter

    Yves, this sounds like the best approach. We also don't want to encourage usage of that XLIFF variant.

    Thank you all for the time :)

  8. ysavourel

    Looking at one of the examples at http://symfony.com/blog/new-in-symfony-2-8-translator-improvements it seems they still use <source> incorrectly in their 2.0 implementation:

    <file id="f1" original="Graphic Example.psd">
        <skeleton href="Graphic Example.psd.skl"/>
        <group id="1">
            <unit id="1">
                <segment>
                    <source>foo</source>
                    <target>XLIFF 文書を編集、または処理 するアプリケーションです。</target>
                </segment>
            </unit>
        </group>
    </file>
    

    The sad part is that the example is coming from one in Wikipedia (https://en.wikipedia.org/wiki/XLIFF#XLIFF_2.0) that shows clearly that <source> is for the source text.

  9. Log in to comment