Tag mismatch in DOCX file leveraged from Moses translation

Issue #735 new
Xavier Richez created an issue

Hi,

This document seems to be troublesome.

When just extracting the Moses text from this document ; and then leveraging the same text into the document, it results in a broken DOCX file. Word is not able to open the resulting document.

$> tikal -fc okf_openxml -xm data.docx -seg -sl en -oe utf8
$> tikal -lm data.docx -fc okf_openxml -sl en -ie utf8 -oe utf8 -overtrg -from data.docx.en -seg

Here's the stacktrace I get when trying to load the document with the OpenXML SDK:

Unhandled Exception: System.Xml.XmlException: **The 'w:sectPrChange' start tag on line 1 position 1353466 does not match the end tag of 'w:pPr'. Line 1, position 1353748.**
   at System.Xml.XmlTextReaderImpl.Throw(Exception e)
   at System.Xml.XmlTextReaderImpl.ThrowTagMismatch(NodeData startTag)
   at System.Xml.XmlTextReaderImpl.ParseEndElement()
   at System.Xml.XmlTextReaderImpl.ParseElementContent()
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
   at DocumentFormat.OpenXml.OpenXmlPartRootElement.LoadFromPart(OpenXmlPart openXmlPart, Stream partStream)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPart.LoadDomTree[T]()

Comments (3)

  1. Chase Tingley

    It looks like this corrupting word/document.xml (and possibly other things) during the merge. I haven't worked with the Moses text steps much; looking at the content of the data.docx.en file, I wonder if there are problems trying to merge those tags back into the docx.

  2. Log in to comment