AbstractMarkupFilter subfiltering produces spurious segments

Issue #303 resolved
Former user created an issue

Original issue 303 created by @ysavourel on 2013-01-02T06:22:06.000Z:

See https://groups.google.com/d/topic/okapi-devel/SdXigj5Uiu4/discussion

Currently subfiltering in the AbstractMarkupFilter produces additional textunits which consist only of a single placeholder. These TUs seem to correspond to the original, pre-subfiltered content, which is then replaced by some inline resource/tag.

This behavior is sub-optimal. In the subfiltering case, we should be producing a START_GROUP event, then the event stream from the subfilter, the an END_GROUP event. No textunit should correspond to the pre-subfiltered content.

Cutting and pasting the example from the above thread:

This XML:
<xml>
<foo><html><head><title>This is the title</title></head><body><p>This is the body.</p></body></html></foo>
</xml>

Produces this XLIFF:
<body>
<group id="tu2_ssf1" resname="sub-filter:foo">
<trans-unit id="tu2_tu1" resname="foo_3" restype="x-title">
<source xml:lang="en">This is the title</source>
</trans-unit>
<trans-unit id="tu2_tu2" resname="foo_6" restype="x-paragraph">
<source xml:lang="en">This is the body.</source>
</trans-unit>
</group>
<trans-unit id="tu2" restype="x-foo">
<source xml:lang="en"><x id="1"/></source>
</trans-unit>
<group id="tu1_ssf2" resname="sub-filter:xml">
</group>
<trans-unit id="tu1" restype="x-xml">
<source xml:lang="en"><x id="1"/></source>
</trans-unit>
</body>

So, there's a couple things going on here. The subfiltered TUs appear in the tu_ssf1 group. This is followed by
the tu2 TU, which consists only of a placeholder -- presumably representing the subfiltered content.

There's then a another group+TU pair, except in this case the group is also empty. This corresponds to
subfiltering the whitespace between the <xml> and <foo> elements.

Comments (3)

  1. Former user Account Deleted

    Comment 1. originally posted by @ysavourel on 2013-04-05T19:04:22.000Z:

    I am actively working on this. It looks like it's possible to move the reference into skeleton and have everything still work. The nastier part is that in order to avoid producing an empty TU, the skeleton from the partially-built TU needs to be shifted into a document part. Unfortunately, that skeleton includes a [$$self$] reference, which needs to be stripped (as it's no longer true). I'm still looking for the cleanest way to do this.

  2. Log in to comment