- edited description
Tag mismatch in DOCX file leveraged from Moses translation
Issue #735
new
Hi,
This document seems to be troublesome.
When just extracting the Moses text from this document ; and then leveraging the same text into the document, it results in a broken DOCX file. Word is not able to open the resulting document.
$> tikal -fc okf_openxml -xm data.docx -seg -sl en -oe utf8
$> tikal -lm data.docx -fc okf_openxml -sl en -ie utf8 -oe utf8 -overtrg -from data.docx.en -seg
Here's the stacktrace I get when trying to load the document with the OpenXML SDK:
Unhandled Exception: System.Xml.XmlException: **The 'w:sectPrChange' start tag on line 1 position 1353466 does not match the end tag of 'w:pPr'. Line 1, position 1353748.**
at System.Xml.XmlTextReaderImpl.Throw(Exception e)
at System.Xml.XmlTextReaderImpl.ThrowTagMismatch(NodeData startTag)
at System.Xml.XmlTextReaderImpl.ParseEndElement()
at System.Xml.XmlTextReaderImpl.ParseElementContent()
at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
at DocumentFormat.OpenXml.OpenXmlCompositeElement.Populate(XmlReader xmlReader, OpenXmlLoadMode loadMode)
at DocumentFormat.OpenXml.OpenXmlPartRootElement.LoadFromPart(OpenXmlPart openXmlPart, Stream partStream)
at DocumentFormat.OpenXml.Packaging.OpenXmlPart.LoadDomTree[T]()
Comments (3)
-
reporter -
reporter - removed milestone
Seems similar to the issue #723
-
It looks like this corrupting
word/document.xml
(and possibly other things) during the merge. I haven't worked with the Moses text steps much; looking at the content of thedata.docx.en
file, I wonder if there are problems trying to merge those tags back into the docx. - Log in to comment