XMLStreamFilter merges back CDATA section incorrectly with Okapi M20

Issue #320 resolved
Former user created an issue

Original issue 320 created by 143.ravik... on 2013-03-28T07:24:17.000Z:

I have a xml file which has html content inside the CDATA section.

<SOLUTIONS><TITLE><![CDATA[<p>The Test Search Alliance</p>]]></TITLE></SOLUTIONS>

The generated XLIFF using the XMLStream Filter for OKPAI M20 contains an extra text unit with a new placeholder -

<body>
<trans-unit id="55" resname="{group:sg1,tu:tu1}" xml:space="preserve">
<source xml:lang="en"><ph id="1">[#$tu1_ssf1]</ph></source>
<seg-source><mrk mid="0" mtype="seg"><ph id="1">[#$tu1_ssf1]</ph></mrk></seg-source>
<target xml:lang="es-ES" state="new"><mrk mid="0" mtype="seg"><ph id="1">[#$tu1_ssf1]</ph></mrk></target>
</trans-unit>
<trans-unit id="54" resname="sd1_1" xml:space="preserve">
<source xml:lang="en">The Test Search Alliance</source>
<seg-source><mrk mid="0" mtype="seg">The Test Search Alliance</mrk></seg-source>
<target xml:lang="es-ES" state="new"><mrk mid="0" mtype="seg">The Test Search Alliance</mrk></target>
</trans-unit>
</body>

Once the translation is done the generated localized file looks like -

<SOLUTIONS><TITLE><p>The Test Search Alliance TRANSLATED</p><![CDATA[[#$tu1_ssf1]]]></TITLE></SOLUTIONS>

The CDATA is not getting merged back correctly. Its pushed to the end of the translated string with the content equal to the value of initially generated placeholder in the XLIFF.

I am using okapi M 20 and the above code used to all work fine with okpai M 14.

Is there anything missing here ? Also attaching my yml file here for reference.

Comments (8)

  1. Former user Account Deleted

    Comment 1. originally posted by @fliden on 2013-03-28T07:53:12.000Z:

    Using your yaml configuration I tried creating a generic xliff package for the string:

    <SOLUTIONS><TITLE><![CDATA[<p>The Test Search Alliance</p>]]></TITLE></SOLUTIONS>

    The xliff I get:
    <?xml version="1.0" encoding="UTF-8"?>
    <xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" xmlns:its="http://www.w3.org/2005/11/its">
    <file original="/test.xml" source-language="en-us" target-language="fr-fr" datatype="xml">
    <body>
    <group id="sg1">
    <group id="tu1_ssf1" resname="sub-filter:sd1">
    <trans-unit id="tu1_tu1" resname="sd1_1" restype="x-paragraph">
    <source xml:lang="en-us">The Test Search Alliance</source>
    <target xml:lang="fr-fr">The Test Search Alliance</target>
    </trans-unit>
    </group>
    <trans-unit id="tu1" restype="x-cdata">
    <source xml:lang="en-us"><x id="1"/></source>
    <target xml:lang="fr-fr"><x id="1"/></target>
    </trans-unit>
    </group>
    </body>
    </file>
    </xliff>

    That seems to merge back fine. Can you give some more details? Not sure why your xliff looks so different from mine. Are you running a custom pipeline?

  2. Former user Account Deleted

    Comment 2. originally posted by 143.ravik... on 2013-03-28T09:23:41.000Z:

    I am also using my own filter which is extending the XmlStreamFilter. Attaching the same here. It is also configured inside the FilterConfigurationMapper and I get the right filter instance based on my custom mime-type.

    Attaching the filter for reference.

    The pile line is as follows -

    IPipelineDriver driver = new PipelineDriver();
    driver.setFilterConfigurationMapper(iFilterConfigurationMapper);
    driver.addStep(new RawDocumentToFilterEventsStep());
    driver.addBatchItem(rawDocument);
    driver.processBatch();

  3. Former user Account Deleted

    Comment 3. originally posted by @fliden on 2013-03-28T18:41:03.000Z:

    Not sure there's anything in the filter that would cause it. Can you confirm that if you're using the plain xmlstream filter with the your yaml configuration it works?
    What are you doing in terms of extraction/merge? The pipeline doesn't show that. The xliff seems to have segmentation as well.
    If I extract it with default segmentation and the <ph> format I get this output.

    <group id="sg1">
    <group id="tu1_ssf1" resname="sub-filter:sd1">
    <trans-unit id="tu1_tu1" resname="sd1_1" restype="x-paragraph">
    <source xml:lang="en-us">The Test Search Alliance</source>
    <seg-source><mrk mid="0" mtype="seg">The Test Search Alliance</mrk></seg-source>
    <target xml:lang="fr-fr"><mrk mid="0" mtype="seg">The Test Search Alliance</mrk></target>
    </trans-unit>
    </group>
    <trans-unit id="tu1" restype="x-cdata">
    <source xml:lang="en-us"><ph id="1">[#$tu1_ssf1]</ph></source>
    <seg-source><mrk mid="0" mtype="seg"><ph id="1">[#$tu1_ssf1]</ph></mrk></seg-source>
    <target xml:lang="fr-fr"><mrk mid="0" mtype="seg"><ph id="1">[#$tu1_ssf1]</ph></mrk></target>
    </trans-unit>
    </group>

    Seems your xliff is missing the reference to the #$tu1_ssf1.

  4. Former user Account Deleted

    Comment 4. originally posted by 143.ravik... on 2013-04-02T11:57:10.000Z:

    Hi,
    I am also finally able to get the same output as yours. The reason why it was not generating the XLIFF as expected was coz of I was not handling the subfilter events in my step which were not there as part of okapi M 14.

    After handling the same as follows the output came up as expected -

            case END\_SUBDOCUMENT:  
            case START\_GROUP:  
            case END\_GROUP:  
            case TEXT\_UNIT:  
            case DOCUMENT\_PART:  
                return iFilterWriter.handleEvent(event);
    

    Is it actually normal to have a text unit generated for just having a reference to the CData -

    <trans-unit id="tu1" restype="x-cdata">
    <source xml:lang="en-us"><ph id="1">[#$tu1_ssf1]</ph></source>
    <seg-source><mrk mid="0" mtype="seg"><ph id="1">[#$tu1_ssf1]</ph></mrk></seg-source>
    <target xml:lang="fr-fr"><mrk mid="0" mtype="seg"><ph id="1">[#$tu1_ssf1]</ph></mrk></target>
    </trans-unit>

    With Okapi M 14 the CData used to be part of the skeleton itself without generating any text units only for reference.

    Is there a way we can avoid this as this doesn't contain any text to be actually translated but its just being a reference ?

  5. Former user Account Deleted

    Comment 5. originally posted by @ysavourel on 2013-04-15T12:12:15.000Z:

    Is it actually normal to have a text unit
    generated for just having a reference to the CData

    This should be resolved now as the issue comment 30.3 has been resolved for M21.

  6. Former user Account Deleted

    Comment 6. originally posted by 143.ravik... on 2013-04-22T15:28:37.000Z:

    I have recently migrated the whole project to use M20. Now for the above fix is it fine if I just update the artifact having the XMLStreamFilter class or do I need to pull in any other dependency. Is that the only change for the fix ?

    Don't want to move the okapi core version to M21 again.

  7. Former user Account Deleted

    Comment 7. originally posted by @ysavourel on 2013-04-23T17:28:02.000Z:

    I would actually double-check that the issue comment 30.3 fix applies here as well. Issue comment 30.3 covered the PCDATA case, which I think may be a different code path.

  8. Former user Account Deleted

    Comment 8. originally posted by 143.ravik... on 2013-04-24T14:44:35.000Z:

    The issue seems to be still there with CDATA. I used the latest M21 version of the following artifacts and still able the see the extra text unit(CDATA Place Holder).

    1. okapi-filter-xmlstream
    2. okpai-filter-abstractmarkup
    3. okapi-filter-html.
  9. Log in to comment