OpenXML Filter: improve Word styles optimisation for the reiterated properties on the directly applied formatting level

Issue #879 closed
Denis Konovalyenko created an issue

Please consider the following case. There is a paragraph with 2 runs, the second of which reiterates the rFonts property, which is present under the default paragraph style applied for that run already on the paragraph level.

<w:p w:rsidR="00D577F2" w:rsidRPr="0076395F" w:rsidRDefault="0076395F">
    <w:pPr>
        <w:rPr>
            <w:lang w:val="en-US"/>
        </w:rPr>
    </w:pPr>
    <w:r w:rsidRPr="00D00CAC">
        <w:rPr>
            <w:sz w:val="24"/>
            <w:szCs w:val="24"/>
            <w:lang w:val="en-US"/>
        </w:rPr>
        <w:t>Run 1.</w:t>
    </w:r>
    <w:r w:rsidR="004629C4" w:rsidRPr="00D00CAC">
        <w:rPr>
            <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
            <w:lang w:val="en-US"/>
        </w:rPr>
        <w:t>Run 2.</w:t>
    </w:r>
<w:style w:type="paragraph" w:default="1" w:styleId="Normal">
    <w:name w:val="Normal"/>
    <w:qFormat/>
    <w:rPr>
        <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
    </w:rPr>
</w:style>

Currently, this is extracted as

<trans-unit id="NFDBB2FA9-tu1" xml:space="preserve">
    <source xml:lang="en"><g id="1">Run 1.</g><g id="2">Run 2.</g></source>
    <target xml:lang="fr"><g id="1">Run 1.</g><g id="2">Run 2.</g></target>
</trans-unit>

It would be nice to have the following extraction:

<trans-unit id="NFDBB2FA9-tu1" xml:space="preserve">
    <source xml:lang="en"><g id="1">Run 1.</g>Run 2.</source>
    <target xml:lang="fr"><g id="1">Run 1.</g>Run 2.</target>
</trans-unit>

For more information please refer to the attached document.

Comments (6)

  1. Chase Tingley

    With the fix to #887, this example now extracts as

    <trans-unit id="NFDBB2FA9-tu1" xml:space="preserve">
    <source xml:lang="en"><g id="1">Run 1.</g>Run 2.</source>
    <target xml:lang="fr"><g id="1">Run 1.</g>Run 2.</target>
    </trans-unit>
    

    Does the fix for #887 serve as a more general form of this improvement, or are there still cases of this sort that would require their own handling?

  2. Denis Konovalyenko reporter

    @Chase Tingley , you re right, the solution for issue #887 provides a more general improvement and this is how runs are extracted with it applied:

    <trans-unit id="NFDBB2FA9-tu1" xml:space="preserve">
    <source xml:lang="en">Run 1.<g id="1">Run 2.</g></source>
    <target xml:lang="fr">Run 1.<g id="1">Run 2.</g></target>
    </trans-unit>
    

    There are 2 equal sets of run properties, so, the first run properties are considered as a base for others.

    With this issue improvement, we would get:

    <trans-unit id="NFDBB2FA9-tu1" xml:space="preserve">
    <source xml:lang="en"><g id="1">Run 1.</g>Run 2.</source>
    <target xml:lang="fr"><g id="1">Run 1.</g>Run 2.</target>
    </trans-unit>
    

    Where the second run is going to get empty properties, thus, they will be chosen as a base for others.

  3. Denis Konovalyenko reporter

    @Chase Tingley , another round of styles optimisations (do not remember sharply which one) has led to fixing this particular issue:

    <source xml:lang="en"><g id="1" ctype="x-empty" equiv-text="&lt;run1>">Run 1.</g>Run 2.</source>
    

    Thanks for keeping an eye on this!

  4. Denis Konovalyenko reporter

    The issue has been resolved in the scope of other ones related to the styles optimisation process.

  5. Log in to comment