DOCX Sub-document Order

Issue #291 resolved
Former user created an issue

Original issue 291 created by @ysavourel on 2012-11-14T17:14:52.000Z:

See http://tech.groups.yahoo.com/group/okapitools/message/3403

Initial Comment:
I have experienced an issue in net.sf.okapi.filters.openxml.OpenXMLFilter.nextInZipFile().

I have noticed that the entries that are iterated over are derived from the docx zip file and these entries have a order in the zip file.

For one example document the entries are processed in the following order:

[Content_Types].xml
_rels/.rels
word/_rels/document.xml.rels
word/document.xml
word/theme/theme1.xml
word/settings.xml
word/fontTable.xml
word/webSettings.xml
docProps/app.xml
docProps/core.xml
word/styles.xml

I do have other documents that Word can open correctly that are processed in the following order:

docProps/app.xml
docProps/core.xml
word/document.xml
word/fontTable.xml
word/settings.xml
word/styles.xml
word/webSettings.xml
word/_rels/document.xml.rels
[Content_Types].xml
_rels/.rels

In the first case, OKAPI handles the document properly. In the second it does not, because it is expected by the net.sf.okapi.filters.openxml.OpenXMLFilter.nextInZipFile() method that the [Content_Types].xml precede the word/styles.xml to work properly.

How were these documents created?

The first is generated using Word.

The second is the product of running the first document through Google Translate.

The net.sf.okapi.filters.openxml.OpenXMLFilter.nextInZipFile() needs to be rewritten to account for a different ordering, hunting down the [Content_Types].xml first...

Comments (2)

  1. Former user Account Deleted

    Comment 1. originally posted by @ysavourel on 2013-07-24T19:11:42.000Z:

    [Content_Types].xml should always be processed first in Word file.

  2. Log in to comment