Invalid DOCX files created with Moses InlineText tag rearranging and round-trip

Issue #176 wontfix
Former user created an issue

Original issue 176 created by Achi... on 2011-07-04T20:35:41.000Z:

Steps to reproduce:
1. Create a new Word document in Microsoft Word 2007 with the text
"This is page ."
2. Position the cursor before the period and choose Insert/Page Number/Current Position/Plain Number: the number 1 is inserted
3. Save the document as test.docx
4. tikal.bat -xm test.docx -sl en

The first line of the resulting test.docx.en is:
This is page <x id="1"/><g id="2">1</g><x id="3"/>.

  1. Edit the first line to read:
    This is page <x id="3"/><x id="1"/><g id="2">1</g>.
  2. Save as
  3. tikal.bat -lm test.docx -totrg -from
  4. Open the resulting test.out.docx in Microsoft Word

Word cannot open the file: "The file test.out.docx cannot be opened because there are problems with the contents."
"The name in the end tag of the element must match the element type in the start tag." Location: Part: /word/document.xml [...]

This kind of tag rearranging, while a bit non-sensical in the example, is happening often in longer segments during translation/machine translation.

Analysis of DOCX XML:
9. Extract contents of test.out.docx with extraction program (e.g. 7zip)
10. View file test.out.docx/word/document.xml

Invalid XML: Closing tag </w:fldSimple> appears before opening tag <w:fldSimple ...>

Comments (5)

  1. Former user Account Deleted

    Comment [1.]( originally posted by @ysavourel on 2011-07-05T03:16:57.000Z:

    I can reproduce the issue. Extracting to XLIFf with <bpt>/<ept> shows the DOCX codes:

    <ph id="1"><w:fldSimple w:instr=" PAGE
    \* MERGEFORMAT "></ph> ... <ph id="3"></w:fldSimple></ph>

    Ideally those two placeholders would be paired tags. But that is difficult to achieve with Word.

  2. Former user Account Deleted

    Comment 4. originally posted by bailo... on 2013-11-22T10:06:32.000Z:

    I'm an user of Okapi and I had a trouble while opening documents in Microsoft Office 2007: File.docx . The error message is: /Word / document.xml line 6294 colums 6293. The problem doesn't exist in OpenOffice, there is no problems in the file.docx (I can open the file without any error message) -lm file.docx -totrg -from aftertest

  3. Log in to comment