- edited description
OpenXML: corrupt formatting of consecutive runs where the second uses default formatting
We discovered a bug with corrupt formatting in a merged document if text passages in the original document are formatted differently. More precisely, it occurs if the first of two runs is formatted in a special way and the second run uses the default formatting of the document. An example can be found in the attachments.
Further investigation lead to the following observation that can be noted in the document.xml-Part of the document:
In the original document, The following XML snippet can be found:
<w:p w:rsidR="00623D7E" w:rsidRPr="007E1190" w:rsidRDefault="007E1190">
<w:pPr>
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
</w:pPr>
<w:r w:rsidRPr="007E1190">
<w:rPr>
<w:rFonts w:ascii="MS Gothic" w:eastAsia="MS Gothic" w:hAnsi="MS Gothic"/>
<w:lang w:val="en-US"/>
</w:rPr>
<w:t>Hello, I’m formatted.</w:t>
</w:r>
<w:r w:rsidRPr="007E1190">
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:t xml:space="preserve"> Hello, I’m not.</w:t>
</w:r>
<w:bookmarkStart w:id="0" w:name="_GoBack"/>
<w:bookmarkEnd w:id="0"/>
</w:p>
As observed, the first one has properties that describe the font while the second does not. In the merged document, the above snippet is converted to the following:
<w:p>
<w:r>
<w:rPr>
<w:rFonts w:ascii="MS Gothic" w:eastAsia="MS Gothic" w:hAnsi="MS Gothic"/>
</w:rPr>
<w:t xml:space="preserve">Hello, I’m formatted. Hello, I’m not.</w:t>
</w:r>
</w:p>
As noted before, the formatting of the first run is also applied to the second run which is obviously not intended.
Since we started investigating this with a very specific case and discovered the general case, we want to inform you about this bug. Also, we are currently working on a fix which will be issued as PR as soon as we've finished fixing and testing it. It will involve changes to the provided test data since it suffered from the same issue.
Comments (4)
-
Account Deactivated reporter -
Thanks! When you submit the PR, please include @DenisKonovalyenko as a reviewer, he knows the code best. (Also include me.)
-
Account Deactivated reporter - attached Trandocxsegtest.docx
- attached OpenXML_text_reference_v1_2.docx
- attached Hello World.trans.docx
- attached TranOpenXML_text_reference_v1_2.docx
- attached Hello World.docx
- attached docxsegtest.docx
In reference to Denis comment on pull request #167, I've found (in the current test files) and created some reference files to show the issue.
-
- changed status to resolved
Fixed. See pull request #167 for extensive discussion.
- Log in to comment