XLIFF Writer doesn't escape XML entities in context metadata

Issue #1013 invalid
Chase Tingley created an issue

This looks like an issue with the XLIFF writer rather than the JSON filter. With source like this:

{
    "value": "Hello world",
    "meta": "Cats & Dogs"
}

and a filter configuration that extracts the meta field as metadata, we XLIFF (either via Tikal, or Rainbow kit) that has unescaped entities in the context-group:

<trans-unit id="tu1" resname="value" xml:space="preserve">
<source xml:lang="en">Hello world</source>
<target xml:lang="fr">Hello world</target>
<context-group><context context-type="meta">Cats & Dogs</context></context-group>
</trans-unit>

Reproduce using the attached and

$ tikal.sh -fc okf_json@meta.fprm -x test.jso

Comments (5)

  1. Denis Konovalyenko

    @Chase Tingley I have taken a quick look at this and have to admit that with the latest dev I have got a bit different extraction (XML entities escaped):

    <body>
    <group id="sg1">
    <trans-unit id="tu1" resname="value" xml:space="preserve">
    <source xml:lang="en">Hello world</source>
    <target xml:lang="fr">Hello world</target>
    <context-group><context context-type="x-meta">Cats &amp; Dogs</context></context-group>
    </trans-unit>
    </group>
    </body>
    

    Also, please notice the context-type differences: meta VS x-meta. Maybe, there was some related work done - will take a closer look at this.

  2. Denis Konovalyenko

    @Chase Tingley the escaping of entities has been implied since cf5c4c92 commit -

    the main reason was “add default EncoderManager to XliffWriter” (please refer to net.sf.okapi.common.filterwriter.XLIFFWriter#getEncoderManager for more details). So, the initialised encoder manager was used for writing in net.sf.okapi.common.annotation.XLIFFContextGroup.Context#toString.

  3. Chase Tingley reporter

    @Denis Konovalyenko You’re right. I retested with a build that contained Jim’s commit and it worked. I will close this.

  4. Log in to comment