XmlStreamFilter: TEXTUNIT + EXCLUDE rules interact strangely

Issue #282 resolved
Former user created an issue

Original issue 282 created by @ysavourel on 2012-10-03T18:11:34.000Z:

What steps will reproduce the problem?
1. Create an XmlStreamFilter configuration with this configuration:
assumeWellformed: true
global_pcdata_subfilter: okf_html
preserve_whitespace: false
elements:
foo:
conditions: [translate, EQUALS, y]
ruleTypes: [TEXTUNIT]
.*:
ruleTypes: [EXCLUDE]
2. Run this filter configuration on the attached test2.xml File.
3. After this fails (see below), try again with this variant configuration. The results will be the same:
assumeWellformed: true
global_pcdata_subfilter: okf_html
preserve_whitespace: false
exclude_by_default: true
elements:
foo:
conditions: [translate, EQUALS, y]
ruleTypes: [TEXTUNIT]

What is the expected output? What do you see instead?

In both cases, the goal is to send the content matching foo[@ translate='y'] to the html subfilter, and then expose the results for translation. All other content should go to the skeleton. So the sentences "Translate me." and "Translate me 4." should be exposed, but the other sentences should not.

Instead, I am seeing a long series of segments that contain only angle brackets -- in XLIFF, it looks like this:
<trans-unit id="tu1">
<source xml:lang="en-us"><</source>
<target xml:lang="fr-fr"><</target>
</trans-unit>
...
<trans-unit id="tu4">
<source xml:lang="en-us">></source>
<target xml:lang="fr-fr">></target>
</trans-unit>

No meaningful test is being exposed for translation.

What version of the product are you using? On what operating system?

This was observed on the dev branch at commit 9e64b0a50a91146e206f78c2f8a636a98695655e. I was running Rainbow on Mac OSX.

Please provide any additional information below.

Comments (5)

  1. Former user Account Deleted

    Comment 1. originally posted by @ysavourel on 2012-11-10T05:59:39.000Z:

    Using the latest nightly (08-Nov-2012), the behavior for both these configurations is now to produce no TUs at all.

  2. Former user Account Deleted

    Comment 2. originally posted by @ysavourel on 2012-11-23T06:59:23.000Z:

    The '<' and '>' segments were fixed by Aaron's fix to AbstractMarkupFilter.handleCharacterEntity(). However, the real bug remains - the legitimate TUs are not being produced. This is because the TEXTUNIT ruleType does not affect the include/exclude rule state stack. As a result, we can enter a text unit, but the filter will still think we are in an exclude state.

    A more complete writeup can be found here:
    https://groups.google.com/d/topic/okapi-devel/HiQjl9qr-mw/discussion

  3. Former user Account Deleted

    Comment 4. originally posted by @ysavourel on 2012-11-23T22:31:13.000Z:

    Note that if the response to this more recent testcase is that the config is incorrect, because the config should specify INCLUDE rather than TEXTUNIT for the attribute matching rule, then this is a subfilter issue after all: because subfiltering is only applied to TEXTUNIT rules, not just regular INCLUDE ones.

  4. Former user Account Deleted

    Comment 5. originally posted by @ysavourel on 2012-11-26T22:36:49.000Z:

    I added to handleStartTag code that would pop an include rule if the global exclude option is set and the rule is successful (only for TEXTUNIT rules for now)

    handleEndTag has new code that will pop off this include rule under the same conditions.

    *This should be considered a hack and we may need to add more rule types to this logic so that they work correctly with global exclude and include/exclude rules.*

    test case ISSUE_282 was added

  5. Log in to comment