DOCX/OpenXML - tag corruption in document.xml

Issue #379 resolved
Former user created an issue

Original issue 379 created by bailo... on 2013-11-22T10:12:26.000Z:

I'm an user of Okapi and I had a trouble while opening documents in Microsoft Office 2007: File.docx . The error message is: /Word / document.xml line 6294 colums 6293. The problem doesn't exist in OpenOffice, there is no problems in the file.docx (I can open the file without any error message)

tikal.sh -lm file.docx -totrg -from aftertest

Comments (4)

  1. Former user Account Deleted

    Comment 2. originally posted by @ysavourel on 2014-01-29T18:44:04.000Z:

    The document.xml file in the attached file.out.docx has been corrupted.

    At the offset (line 2, column 6294) there is some very weird tag structure in which a run (<w:r>) has been embedded directly within the <w:t> of another run. It looks like this:

    <w:p>
    <w:pPr>
    <!-- snipped for space -->
    </w:pPr>
    <w:r>
    <w:rPr>
    <w:rStyle w:val="CharAttribute0"/>
    <w:rFonts w:eastAsia="Batang"/>
    <w:sz w:val="24"/>
    <w:szCs w:val="24"/>
    </w:rPr>
    <w:t xml:space="preserve">
    <w:r> <------------- What
    <w:rPr>
    <w:rStyle w:val="CharAttribute0"/>
    <w:rFonts w:eastAsia="Batang"/>
    <w:sz w:val="24"/>
    <w:szCs w:val="24"/>
    <w:u w:val="single"/>
    </w:rPr>
    <w:t> légende</w:t>
    </w:r>
    </w:t> <---- what
    </w:r>
    </w:p>

    This isn't invalid XML, but I'm pretty sure it's illegal in OpenXML. (I'd need to check.)

    Also, it looks like document.xml has some tag mismatches further on (line 2, column 42916), based on trying to open it in an XML editor.

  2. Log in to comment