XLIFF: Set xml:space="preserve" for entries created using ITS rule with itsx:whiteSpaces="preserve"

Issue #311 resolved
Former user created an issue

Original issue 311 created by khagar... on 2013-02-03T08:42:36.000Z:

When configuring custom ITS with entries using itsx:whiteSpaces="preserve", the extracted entries in XLIFF do have the spaces preserved, but don't have xml:space="preserve" set, so processing them in tools that do not treat XLIFF as xml:space="preserve" by default (fe. OmegaT) results in wrong translations. If the entry is created by a ITS rule with itsx:whiteSpaces="preserve" it should have xml:space="preserve" set.

Comments (9)

  1. Former user Account Deleted
    • changed status to open

    Comment 1. originally posted by @ysavourel on 2013-02-03T12:07:00.000Z:

    It seems to be working for me.

    For example, if I process:

    <?xml version="1.0" ?>
    <doc xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0">
    <prolog>
    <date>2013-02-13</date>
    <its:rules version="2.0" xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
    <its:translateRule selector="/doc/prolog" translate="no"/>
    <its:idValueRule selector="//para" idValue="@ id"/>
    <its:withinTextRule selector="//b" withinText="yes"/>
    <its:translateRule selector="//literal" translate="yes" itsx:whiteSpaces="preserve"/>
    </its:rules>
    </prolog>
    <body>
    <para id="p1">Rome is the capital city of Italy.</para>
    <para id="p2">It is also the country's largest and most populated comune and fourth-most populous city in the European Union by population within city limits.</para>
    <literal xml:space='preserve'>Country: Italy
    Population: 2,777,979 (2011)
    Time zone: CET</literal>
    </body>
    </doc>

    I get:

    <?xml version="1.0" encoding="UTF-8"?>
    <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" xmlns:its="http://www.w3.org/2005/11/its">
    <file original="/Example_XML.xml" source-language="en-us" target-language="fr-fr" datatype="xml">
    <body>
    <trans-unit id="1" resname="p1">
    <source xml:lang="en-us">Rome is the capital city of <g id="1">Italy</g>.</source>
    </trans-unit>
    <trans-unit id="2" resname="p2">
    <source xml:lang="en-us">It is also the country's largest and most populated comune and fourth-most populous city in the European Union by population within city limits.</source>
    </trans-unit>
    <trans-unit id="3" xml:space="preserve">
    <source xml:lang="en-us">Country: Italy
    Population: 2,777,979 (2011)
    Time zone: CET</source>
    </trans-unit>
    </body>
    </file>
    </xliff>

    As you can see the xml:space is set on the trans-unit. xml:space is inherited by all children elements (http://www.w3.org/TR/xml/#sec-white-space).

    Do you have an example where it's not working?
    Thanks.
    -yves

  2. Former user Account Deleted

    Comment 2. originally posted by khagar... on 2013-02-03T15:02:23.000Z:

    Try source XML without xml:space="preserve".

  3. Former user Account Deleted

    Comment 3. originally posted by @ysavourel on 2013-02-03T15:12:43.000Z:

    Sorry: Ive tried several cases and mis-copied the example in my previous post.

    I did try without xml:preserve='preserve' in <literal>:

    <?xml version="1.0" ?>
    <doc xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0">
    <prolog>
    <date>2013-02-13</date>
    <its:rules version="2.0" xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
    <its:translateRule selector="/doc/prolog" translate="no"/>
    <its:idValueRule selector="//para" idValue="@ id"/>
    <its:withinTextRule selector="//b" withinText="yes"/>
    <its:translateRule selector="//literal" translate="yes" itsx:whiteSpaces="preserve"/>
    </its:rules>
    </prolog>
    <body>
    <para id="p1">Rome is the capital city of Italy.</para>
    <para id="p2">It is also the country's largest and most populated comune and fourth-most populous city in the European Union by population within city limits.</para>
    <literal>Country: Italy
    Population: 2,777,979 (2011)
    Time zone: CET</literal>
    </body>
    </doc>

    and got the exact same result:
    <trans-unit id="3" xml:space="preserve">

  4. Former user Account Deleted

    Comment 4. originally posted by khagar... on 2013-02-03T16:39:42.000Z:

    Then I wonder why it doesn't work for me, perhaps it's because the files I'm translating do have the translatable text stored as attributes.

  5. Former user Account Deleted

    Comment 5. originally posted by @ysavourel on 2013-02-03T17:38:25.000Z:

    The property should affect the translated attributes as well.
    If you send me an example that reproduce the problem for you I can try to debug it and fix it.
    -ys

  6. Former user Account Deleted

    Comment 6. originally posted by khagar... on 2013-02-03T22:45:03.000Z:

    Here is an example file and the ITS rule. I'm creating an OmegaT project using the translation kit creation.

  7. Former user Account Deleted

    Comment 7. originally posted by @ysavourel on 2013-02-04T12:52:46.000Z:

    Thanks, I can reproduce the issue.
    As you suggested, it looks like a difference in the way we process the extracted text when it comes from an attribute.
    I'll work on it.

  8. Log in to comment