Non-numeric TextUnit code ids are converted to integers during parsing

Issue #546 new
Former user created an issue

When merging translated xliff inline file back to sdlxliff file, tag id's are not handled correctly in <alt-trans> element.

Steps to repeat: 1) here is simple test sdlxliff file. Notice that source text contains <g> tag with an id attibute which is not an integer number.

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns:sdl="http://sdl.com/FileTypes/SdlXliff/1.0" xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2" sdl:version="1.0">
    <file original="fake.itd" datatype="x-sdlfilterframework2" source-language="en" target-language="lv">
            <body>
            <group>
                <trans-unit id="de6ce0cb-df15-4930-a4dc-70ab4528e44c">
                    <source><g id="1">Yes</g> or <g id="one">no</g>?</source>
                </trans-unit>
            </group>
        </body>
    </file>
</xliff>

2)xliff inline text is extracted from sdlxliff file with command like this:

tikal.sh -xm source.sdlxliff -sl en -to source.inline -fc okf_xliff-sdl

3)Extracted xliff inline file looks like this, notice that first <g> tag with an integer id is kept intact, but second <g> tag with free text id was assigned a new id that is an integer.

<g id="1">Yes</g> or <g id="110182">no</g>?

4) Xliff inline file is translated and now looks like this:

<g id="1"></g> vai <g id="110182"></g>?

5) Translated inline file is merged back into sdlxliff with command like this.

tikal.sh -lm source.sdlxliff -sl en -tl lv -from target.inline -to target.sdlxliff -overtrg -fc okf_xliff-sdl

6) After merging inline back to sdlxliff, it looks like this (at he bottom). Notice, that in <target> tag the "<g id="110182">" was transformed to use the original id attribute "<g id="one">". But in <alt-text> tag, the "<g id="110182">" was not transformed to "<g id="one">", and THAT is the bug I am complaining about.

<?xml version="1.0" encoding="UTF-8"?>
<xliff xmlns:sdl="http://sdl.com/FileTypes/SdlXliff/1.0" xmlns="urn:oasis:names:tc:xliff:document:1.2" version="1.2" sdl:version="1.0">
    <file original="fake.itd" datatype="x-sdlfilterframework2" source-language="en" target-language="lv">
            <body>
            <group>
                <trans-unit id="de6ce0cb-df15-4930-a4dc-70ab4528e44c">
                    <source><g id="1">Yes</g> or <g id="one">no</g>?</source>
                <target xml:lang="lv"><g id="1"></g> vai <g id="one"></g>?</target>
<alt-trans match-quality="10" origin="Moses-MT" xmlns:okp="okapi-framework:xliff-extensions" okp:matchType="MT">
<target xml:lang="lv"><g id="1"></g> vai <g id="110182"></g>?</target>
</alt-trans>
</trans-unit>
            </group>
        </body>
    </file>
</xliff>

Comments (2)

  1. Chase Tingley

    Well...

            catch ( NumberFormatException e ) {
                // Falls back to the hash-code
                //TODO: At some point code id needs to support a string
                return id.hashCode();
            }
    

    (From XLIFFFilter#retrieveId)

    The problem is that the core Code class, to which these tags are mapped, only supports integer IDs. That's probably a big change.

  2. Log in to comment