Invalid DOCX files created with Moses InlineText tag rearranging and round-trip

Issue #176 wontfix
Former user created an issue

Original issue 176 created by Achi... on 2011-07-04T20:35:41.000Z:

Steps to reproduce:
1. Create a new Word document in Microsoft Word 2007 with the text
"This is page ."
2. Position the cursor before the period and choose Insert/Page Number/Current Position/Plain Number: the number 1 is inserted
3. Save the document as test.docx
4. tikal.bat -xm test.docx -sl en

The first line of the resulting test.docx.en is:
This is page <x id="1"/><g id="2">1</g><x id="3"/>.

  1. Edit the first line to read:
    This is page <x id="3"/><x id="1"/><g id="2">1</g>.
  2. Save as test.docx.fr
  3. tikal.bat -lm test.docx -totrg -from test.docx.fr
  4. Open the resulting test.out.docx in Microsoft Word

Result:
Word cannot open the file: "The file test.out.docx cannot be opened because there are problems with the contents."
Details:
"The name in the end tag of the element must match the element type in the start tag." Location: Part: /word/document.xml [...]

Remark:
This kind of tag rearranging, while a bit non-sensical in the example, is happening often in longer segments during translation/machine translation.

Analysis of DOCX XML:
9. Extract contents of test.out.docx with extraction program (e.g. 7zip)
10. View file test.out.docx/word/document.xml

Invalid XML: Closing tag </w:fldSimple> appears before opening tag <w:fldSimple ...>

Comments (5)

  1. Former user Account Deleted

    Comment [1.](https://code.google.com/p/okapi/issues/detail?id=176#c1) originally posted by @ysavourel on 2011-07-05T03:16:57.000Z:

    I can reproduce the issue. Extracting to XLIFf with <bpt>/<ept> shows the DOCX codes:

    <ph id="1"><w:fldSimple w:instr=" PAGE
    \* MERGEFORMAT "></ph> ... <ph id="3"></w:fldSimple></ph>

    Ideally those two placeholders would be paired tags. But that is difficult to achieve with Word.

  2. Former user Account Deleted

    Comment 4. originally posted by bailo... on 2013-11-22T10:06:32.000Z:

    I'm an user of Okapi and I had a trouble while opening documents in Microsoft Office 2007: File.docx . The error message is: /Word / document.xml line 6294 colums 6293. The problem doesn't exist in OpenOffice, there is no problems in the file.docx (I can open the file without any error message)

    tikal.sh -lm file.docx -totrg -from aftertest

  3. Log in to comment