OpenXML: Consecutive runs containing tabs can be incorrectly merged

Issue #467 resolved
Chase Tingley created an issue

The attached file contains some text with two embedded tabs, divided across multiple runs. The pattern looks like this:

<w:r>
  <w:tab/>
  <w:t>-</w:t>
</w:r>
<w:r>
  <w:tab/>
  <w:t>R</w:t>
</w:r>

Because the runs have the same properties, they are merged, but the tabs are ignored, so you end up with a single run like this:

<w:r>
  <w:tab/>
  <w:t>-R</w:t>
</w:r>

The result of this is that one of the tabs appears to disappear during translation.

Comments (3)

  1. Chase Tingley reporter

    Fix Issue 458, Fix Issue 467, and Fix Issue 473 in the openxml filter

    This rewrites OpenXMLContentFilter.combineRepeatedFormat() and
    splits out the markup simplification content to a new class called
    ParagraphSimplifier.  This fixes many issues in the old code with
    multiple <t> elements in a single run, as well as issues with tabs
    and linebreaks that were being lost when interspersed with text.
    This covers Issue 458 and re-fixes Issue 467 in a better way.
    
    Additional fixes were to Issue 473 and an unfiled problem with
    entities in deleted text that weren't being re-escaped in target
    output.
    
    This has caused some changes to placeholder creation in segments.
    

    → <<cset 2445b887857a>>

  2. Chase Tingley reporter

    Fix Issue 458, Fix Issue 467, and Fix Issue 473 in the openxml filter

    This rewrites OpenXMLContentFilter.combineRepeatedFormat() and
    splits out the markup simplification content to a new class called
    ParagraphSimplifier.  This fixes many issues in the old code with
    multiple <t> elements in a single run, as well as issues with tabs
    and linebreaks that were being lost when interspersed with text.
    This covers Issue 458 and re-fixes Issue 467 in a better way.
    
    Additional fixes were to Issue 473 and an unfiled problem with
    entities in deleted text that weren't being re-escaped in target
    output.
    
    This has caused some changes to placeholder creation in segments.
    

    → <<cset 2445b887857a>>

  3. Log in to comment