OpenXML Filter: runs get merged with different fonts

Issue #883 new
Denis Konovalyenko created an issue

Please consider the following set of characters and their original font values:

And the corresponding paragraph formatting:

            <w:r w:rsidR="005C7308">
                <w:rPr>
                    <w:rFonts w:asciiTheme="minorHAnsi" w:eastAsia="SimSun" w:hAnsiTheme="minorHAnsi"/>
                    <w:lang w:val="en-US"/>
                </w:rPr>
                <w:t xml:space="preserve"> </w:t>
            </w:r>
            <w:r w:rsidR="0020382C" w:rsidRPr="0020382C">
                <w:rPr>
                    <w:rFonts w:eastAsia="SimSun"/>
                </w:rPr>
                <w:t>§¶</w:t>
            </w:r>

After the OpenXML filter is applied, the symbols get merged and their fonts appeared like that:

And below is the related paragraph part:

            <w:r>
                <w:rPr>
                    <w:rFonts w:asciiTheme="minorHAnsi" w:hAnsiTheme="minorHAnsi" w:eastAsia="SimSun"/>
                </w:rPr>
                <w:t xml:space="preserve"> §¶</w:t>
            </w:r>

For more information please refer to the attached documents.

Comments (3)

  1. Denis Konovalyenko reporter

    What is more, according to the specification, the same style (run property) is substituted by another one appeared on the next levels (document defaults, paragraph style, run style or direct/inline formatting). It looks like the run fonts property may stand out of this algorithm and the content categories are substituted throughout the styles hierarchy as style properties.

    So, if we have the document defaults with:

    <w:rFonts w:asciiTheme="minorHAnsi" w:eastAsiaTheme="minorEastAsia" w:hAnsiTheme="minorHAnsi"
              w:cstheme="minorBidi"/>
    

    Then the default paragraph style with:

    <w:rFonts w:ascii="Courier New" w:hAnsi="Courier New" w:cs="Courier New"/>
    

    and finally, the direct formatting:

    <w:rFonts w:eastAsia="SimSun"/>
    

    Then the result of such substitution can be:

    <w:rFonts w:asciiTheme="minorHAnsi" w:eastAsiaTheme="minorEastAsia" w:hAnsiTheme="minorHAnsi"
              w:cstheme="minorBidi"
              w:ascii="Courier New" w:hAnsi="Courier New" w:cs="Courier New"
              w:eastAsia="SimSun"/>
    

    However, as all *Theme attributes must be used instead of ascii, hAnsi, cs, eastAsia, there has to be something like that (when the document defaults are not taken into account for some reasons):

    <w:rFonts w:ascii="Courier New" w:hAnsi="Courier New" w:cs="Courier New"
              w:eastAsia="SimSun"/>
    

    This is just a speculation and additional research is required.

  2. Denis Konovalyenko reporter

    It would be great to check the behaviour of Google Docs and Libre Office on that matter as well.

  3. Log in to comment