OpenXML: merge runs where the fonts are the same, but on different scripts
Issue #487
resolved
Attached is an example of a slightly contrived example that's based off a real-world test file which I think may have been produced by a converter.
This file contains one sentence in the MS Arial Unicode font. However, it's split into three runs, because it contains one non-ascii character in the middle of it. The ASCII sections are formatted with the property
<w:rFonts w:ascii="Arial Unicode MS"/>
while the non-ascii character is formatted with the property
<w:rFonts w:hAnsi="Arial Unicode MS"/>
(In OpenXML, "hAnsi" is used for anything that is not ASCII, in the East Asian range, or a complex script.)
These runs could be merged into a single run with the union of their font properties, like this:
<w:rFonts w:ascii="Arial Unicode MS" w:hAnsi="Arial Unicode MS"/>
This would improve segment quality.
Comments (2)
-
reporter -
reporter - changed status to resolved
Fixed in M29
- Log in to comment
This is fixed on the tag-rework branch.