- changed status to resolved
DOCX Sub-document Order
Original issue 291 created by @ysavourel on 2012-11-14T17:14:52.000Z:
See http://tech.groups.yahoo.com/group/okapitools/message/3403
Initial Comment:
I have experienced an issue in net.sf.okapi.filters.openxml.OpenXMLFilter.nextInZipFile().
I have noticed that the entries that are iterated over are derived from the docx zip file and these entries have a order in the zip file.
For one example document the entries are processed in the following order:
[Content_Types].xml
_rels/.rels
word/_rels/document.xml.rels
word/document.xml
word/theme/theme1.xml
word/settings.xml
word/fontTable.xml
word/webSettings.xml
docProps/app.xml
docProps/core.xml
word/styles.xml
I do have other documents that Word can open correctly that are processed in the following order:
docProps/app.xml
docProps/core.xml
word/document.xml
word/fontTable.xml
word/settings.xml
word/styles.xml
word/webSettings.xml
word/_rels/document.xml.rels
[Content_Types].xml
_rels/.rels
In the first case, OKAPI handles the document properly. In the second it does not, because it is expected by the net.sf.okapi.filters.openxml.OpenXMLFilter.nextInZipFile() method that the [Content_Types].xml precede the word/styles.xml to work properly.
How were these documents created?
The first is generated using Word.
The second is the product of running the first document through Google Translate.
The net.sf.okapi.filters.openxml.OpenXMLFilter.nextInZipFile() needs to be rewritten to account for a different ordering, hunting down the [Content_Types].xml first...
Comments (2)
-
Account Deleted -
Account Deleted Comment 2. originally posted by @ysavourel on 2013-07-24T19:26:47.000Z:
This issue was closed by revision 7fdf256f4093.
- Log in to comment
Comment 1. originally posted by @ysavourel on 2013-07-24T19:11:42.000Z:
[Content_Types].xml should always be processed first in Word file.