OpenXML: merge runs where the fonts are the same, but on different scripts

Issue #487 resolved
Chase Tingley created an issue

Attached is an example of a slightly contrived example that's based off a real-world test file which I think may have been produced by a converter.

This file contains one sentence in the MS Arial Unicode font. However, it's split into three runs, because it contains one non-ascii character in the middle of it. The ASCII sections are formatted with the property

<w:rFonts w:ascii="Arial Unicode MS"/>

while the non-ascii character is formatted with the property

<w:rFonts w:hAnsi="Arial Unicode MS"/>

(In OpenXML, "hAnsi" is used for anything that is not ASCII, in the East Asian range, or a complex script.)

These runs could be merged into a single run with the union of their font properties, like this:

<w:rFonts w:ascii="Arial Unicode MS" w:hAnsi="Arial Unicode MS"/>

This would improve segment quality.

Comments (2)

  1. Log in to comment