OpenXML Filter: segmentation quality reduced for some PPTX documents

Issue #977 resolved
Denis Konovalyenko created an issue

Please consider the following extraction:

<source xml:lang="en">The quick brown fox jumps over the lazy dog. The quick brown fox
 jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown
 fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. <x id="1" c
type="x-x" equiv-text="&lt;tags1/>"/>The quick brown fox jumps over the lazy dog. The
quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy d
og. <x id="2" ctype="x-x" equiv-text="&lt;tags2/>"/>The quick brown fox jumps over the
 lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over
 the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps
over the lazy dog. <x id="3" ctype="x-x" equiv-text="&lt;tags3/>"/></source>

There is extra <x id="3"> code in the end.

The expected output mustn't contain it:

<source xml:lang="en">The quick brown fox jumps over the lazy dog. The quick brown fox
 jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown
 fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. <x id="1" c
type="x-x" equiv-text="&lt;tags1/>"/>The quick brown fox jumps over the lazy dog. The
quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy d
og. <x id="2" ctype="x-x" equiv-text="&lt;tags2/>"/>The quick brown fox jumps over the
 lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over
 the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps
over the lazy dog. </source>

For more details please refer to the attached document.

Comments (2)

  1. Log in to comment