XMLStreamFilter with HTMLSubfilter doesn't group back the XML tags correctly with M22

Issue #339 resolved
Former user created an issue

Original issue 339 created by 143.ravik... on 2013-05-15T15:22:36.000Z:

My Source File is as follows -
<Solution>
<RESOLUTION>
<![CDATA[<p><li>Test</p></li>]]>
</RESOLUTION>

<DESCRIPTION>
<![CDATA[<p> Testing </p>]]>
</DESCRIPTION>
</Solution>

While merging back the file it merges CDATA outside the parent tag as follows -

<Solution>
<RESOLUTION></RESOLUTION>

<![CDATA[<p><li>Test</p></li>]]>

<DESCRIPTION></DESCRIPTION>
<![CDATA[<p> Testing </p>]]>
</Solution>

It used to work fine in earlier version (M20) but started occurring after updating the "okapi-filter-abstractmarkup" to "0.22-SNAPSHOT" for pulling the fix for the following ticket http://code.google.com/p/okapi/issues/detail?id=332.

Interestingly it happens only for tags defined as ruleTypes: [GROUP] in my yml file.

yml definition for the above tags are as follows -

resolution:
ruleTypes: [GROUP]
description:
ruleTypes: [GROUP]

Comments (13)

  1. Former user Account Deleted

    Comment 3. originally posted by @ysavourel on 2013-05-15T21:15:34.000Z:

    This is almost certainly something I've done wrong.

  2. Former user Account Deleted

    Comment 4. originally posted by 143.ravik... on 2013-05-28T16:28:49.000Z:

    Is the fix available in the dev build now ? can I test it ?

  3. Former user Account Deleted

    Comment 5. originally posted by @ysavourel on 2013-05-28T18:55:31.000Z:

    Sorry, no, it's not fixed yet.

  4. Former user Account Deleted

    Comment 6. originally posted by @ysavourel on 2013-06-05T17:20:34.000Z:

    Hi ravikant,

    Thanks for your patience. I finally had a chance to look at this and... I'm afraid I may need more information from you. I'm not able to reproduce this problem using a basic roundtrip test. Here's what I did:
    * Copied your source file to a file called cdataWithGroup.xml (attached)
    * Created a filter config with your rules, okf_xmlstream@ cdata.fprm (attached)

    Then I ran two tikal commands to convert the source XML to XLIFF, and then back to XML:
    tikal.sh -fc okf_xmlstream\@ cdata.fprm -x cdataWithGroup.xml
    tikal.sh -fc okf_xmlstream\@ cdata.fprm -m cdataWithGroup.xml.xlf

    This produces an output file (cdataWithGroup.out.xml) which I would expect to demonstrate the problem, if it were just a matter of the filter misbehaving. However, the output file looks fine to me.

    So it seems that there's another factor involved which I will need to take into account in order to reproduce this. Can you provide any more details about what you were doing to the source file after it had been segmented? (ie, how was it translated?)

    Thanks

  5. Former user Account Deleted

    Comment 7. originally posted by 143.ravik... on 2013-06-06T15:16:20.000Z:

    Hi Tingley,

    At my end with the M22 snap shot version of the "okapi-filter-abstractmarkup" jar the original issue of an spurious segment getting generated is fixed. It used t generate 1 for each CDATA tag.

    Source file b.xml attached.

    Now I do not see that getting generated anymore and the XLIFF output is also as expected.(de-DE.xlf).

    While generating the XML file back there is a pipeline used which takes the original .xml file as the RawDocument and adds the following steps -
    1. RawDocumentToFilterEventsStep()
    2.driver.setFilterConfigurationMapper();
    3. TranslateStep()
    4. FilterEventsStreamWriterStep().

    The translate step just updates each text unit targets with the appropriate localized strings

    This output of this pipe line is the xml back where I see the tags getting misplaced.

    Also I see 1 more difference in terms of the rules which u have set in the attached okf_xmlstream@ cdata.fprm -

    I have used the "element" -

    global_cdata_subfilter: okf_html
    preserve_whitespace: false

    elements:
    solutions:
    ruleTypes: [INCLUDE]
    resolution:
    ruleTypes: [GROUP]
    description:
    ruleTypes: [GROUP]

    but you seem to have used the "attributes"

    global_cdata_subfilter: okf_html
    preserve_whitespace: false
    attributes:
    resolution:
    ruleTypes: [GROUP]
    description:
    ruleTypes: [GROUP]

    Not sure if this too could be the difference in the output which we both are seeing.

  6. Former user Account Deleted

    Comment 8. originally posted by 143.ravik... on 2013-06-10T14:14:21.000Z:

    Hi Tingley,

    Did my comments help ? Were you able to reproduce at your side ?

    Thanks

  7. Former user Account Deleted

    Comment 9. originally posted by KFLi... on 2013-06-10T17:38:51.000Z:

    Hi both,

    Just to confirm I'm getting the same output Ravi is getting with his configuration.
    I have to admin I'm not sure about when to use GROUP and when to use TEXTUNIT though.
    If using TEXTUNIT it merges back ok but it creates the extraneous empty xliff TextUnits.

    Fredrik

  8. Former user Account Deleted

    Comment 10. originally posted by @ysavourel on 2013-06-14T19:09:25.000Z:

    Hi ravikant,

    Yes, you're right, I had a mistake in my YML configuration. Thanks for pointing that out. I'm able to reproduce the problem now.

    Fredrik: I agree, the semantics of several of the tag rules (including TEXTUNIT) are not very clear.

    I assume that GROUP is intended to produce START_GROUP/END_GROUP events, which are used for example to produce <group> elements in XLIFF. Looking at the XLIFF output from tikal, it looks like this issue may be related to the fact that subfiltering also always produces a group. For example:

    <group id="sg1">
    <group id="sg1_ssf1" resname="sub-filter:sd1">
    <trans-unit id="sg1_tu1" resname="sd1_1" restype="x-paragraph">
    <source xml:lang="en"></source>
    <target xml:lang="fr"></target>
    </trans-unit>
    <trans-unit id="sg1_tu2" resname="sd1_2" restype="x-li">
    <source xml:lang="en">Test</source>
    <target xml:lang="fr">Test</target>
    </trans-unit>
    </group>
    </group>

    Note the nested <group> elements. XLIFF allows nested <group>, although it's not commonly used in my experience. I wonder if this is confusing our merger.

    I'll step through this.

    Sorry for the slow progress, I've had almost no free time in the past few weeks.

  9. Former user Account Deleted

    Comment 11. originally posted by @ysavourel on 2013-06-14T19:32:10.000Z:

    This is just state confusion during the event generation. The reference subfilter content isn't being correctly included in the skeleton for either of the group events. Instead it gets left for the DOCUMENT_PART event that follows. This moves the CDATA section outside of its parent element on reassembly.

  10. Former user Account Deleted

    Comment 13. originally posted by 143.ravik... on 2013-06-19T07:18:29.000Z:

    Thanks a lot Tingley for looking into this.

  11. Log in to comment