OpenXML Filter: unusable on Windows platform

Issue #1180 resolved
Denis Konovalyenko created an issue

The original discussion started on the devel list.

Yves wrote:

I'm testing the issue_1153 branch (textUnitMerger PR) which has the latest from dev. I'm on Windows and getting this error:
net.sf.okapi.common.exceptions.OkapiBadFilterInputException: Unsupported main document part
    at net.sf.okapi.filters.openxml.Document$General.initializeMainPartPathAndRelationshipsNamespace(Document.java:196)

for all tests of the OpenXML Filter. This both from Eclipse and from the command line build. Anyone else with that issue?

Yves wrote:

The issue comes from  initializeMainPartPathAndRelationshipsNamespace() where, the first calls trickle down to the exception. I guess there is a type of relationship that's missing.
        private void initializeMainPartPathAndRelationshipsNamespace() throws IOException, XMLStreamException {
            final Iterator<Relationship> relationshipsOfOfficeDocumentSourceTypeIterator =
                this.rootRelationships.of(Namespace.DOCUMENT_RELATIONSHIPS.concat(OFFICE_DOCUMENT)).iterator();
            final Iterator<Relationship> relationshipsOfStrictOfficeDocumentSourceTypeIterator =
                this.rootRelationships.of(Namespace.STRICT_DOCUMENT_RELATIONSHIPS.concat(OFFICE_DOCUMENT)).iterator();
            final Iterator<Relationship> relationshipsVisioDocumentSourceTypeIterator =
                this.rootRelationships.of(Namespace.VISIO_DOCUMENT_RELATIONSHIPS.concat(DOCUMENT)).iterator();
            if (relationshipsOfOfficeDocumentSourceTypeIterator.hasNext()) {
                this.mainPartPath = relationshipsOfOfficeDocumentSourceTypeIterator.next().target();
                this.mainPartRelationshipsNamespace = new Namespace.Default(Namespace.DOCUMENT_RELATIONSHIPS);
            } else if (relationshipsOfStrictOfficeDocumentSourceTypeIterator.hasNext()) {
                this.mainPartPath = relationshipsOfStrictOfficeDocumentSourceTypeIterator.next().target();
                this.mainPartRelationshipsNamespace = new Namespace.Default(Namespace.STRICT_DOCUMENT_RELATIONSHIPS);
            } else if (relationshipsVisioDocumentSourceTypeIterator.hasNext()) {
                this.mainPartPath = relationshipsVisioDocumentSourceTypeIterator.next().target();
                this.mainPartRelationshipsNamespace = new Namespace.Default(Namespace.VISIO_DOCUMENT_RELATIONSHIPS);
            } else {
                throw new OkapiBadFilterInputException(UNSUPPORTED_MAIN_DOCUMENT_PART);
            }
        }

Yves wrote:

I cannot see any place where this.rootRelationships is initialized with some content.
The initialization seems to be in Document.open():
this.rootRelationships = relationshipsFor(EMPTY);And obviously that result in an empty list, leading to the fail in initializeMainPartPathAndRelationshipsNamespace() a few lines after.

Yves wrote:

I went back to commit "5d582f3" (Chase, Dec-16) to see how far the issue comes from. And I get the same error. In that version we have:       

private void initializeMainPartPathAndDocumentRelationshipsNamespace() throws IOException, XMLStreamException {
            final Relationships relationships = relationshipsFor(EMPTY);
            final Iterator<Relationship> relationshipsOfOfficeDocumentSourceTypeIterator =                relationships.of(Namespace.DOCUMENT_RELATIONSHIPS.concat(OFFICE_DOCUMENT)).iterator();
            final Iterator<Relationship> relationshipsOfStrictOfficeDocumentSourceTypeIterator =                relationships.of(Namespace.STRICT_DOCUMENT_RELATIONSHIPS.concat(OFFICE_DOCUMENT)).iterator();
            final Iterator<Relationship> relationshipsVisioDocumentSourceTypeIterator =                relationships.of(Namespace.VISIO_DOCUMENT_RELATIONSHIPS.concat(DOCUMENT)).iterator();
            if (relationshipsOfOfficeDocumentSourceTypeIterator.hasNext()) {
                this.mainPartPath = relationshipsOfOfficeDocumentSourceTypeIterator.next().target();                this.documentRelationshipsNamespace = new Namespace.Default(Namespace.DOCUMENT_RELATIONSHIPS);
            } else if (relationshipsOfStrictOfficeDocumentSourceTypeIterator.hasNext()) {
                this.mainPartPath = relationshipsOfStrictOfficeDocumentSourceTypeIterator.next().target();                this.documentRelationshipsNamespace = new Namespace.Default(Namespace.STRICT_DOCUMENT_RELATIONSHIPS);
            } else if (relationshipsVisioDocumentSourceTypeIterator.hasNext()) {
                this.mainPartPath = relationshipsVisioDocumentSourceTypeIterator.next().target();                this.documentRelationshipsNamespace = new Namespace.Default(Namespace.VISIO_DOCUMENT_RELATIONSHIPS);
            } else {
                throw new OkapiBadFilterInputException(UNSUPPORTED_MAIN_DOCUMENT_PART);
            }
        }
So this seems to be a relatively old issue.Note that for our code, we normally work with a version of the snapshot that is behind the official one. That's why I didn't notice the problem before.

Yves wrote:

Commit 81330e4 (Mihai, Dec-4) works OK:       

private void initializeMainPartNameAndDocumentRelationshipsNamespace() throws IOException, XMLStreamException {
            final Relationships relationships = relationshipsFor(EMPTY);            final String officeDocumentSourceType = Namespace.DOCUMENT_RELATIONSHIPS.concat(OFFICE_DOCUMENT);
            final String strictOfficeDocumentSourceType = Namespace.STRICT_DOCUMENT_RELATIONSHIPS.concat(OFFICE_DOCUMENT);
            final String visioDocumentSourceType = Namespace.VISIO_DOCUMENT_RELATIONSHIPS.concat(DOCUMENT);

if (relationships.hasRelType(officeDocumentSourceType)) {
                this.mainPartName = relationships.getRelByType(officeDocumentSourceType).get(0).target;
                this.documentRelationshipsNamespace = new Namespace.Default(Namespace.DOCUMENT_RELATIONSHIPS);
            } else if (relationships.hasRelType(strictOfficeDocumentSourceType)) {
                this.mainPartName = relationships.getRelByType(strictOfficeDocumentSourceType).get(0).target;
                this.documentRelationshipsNamespace = new Namespace.Default(Namespace.STRICT_DOCUMENT_RELATIONSHIPS);
            } else if (relationships.hasRelType(visioDocumentSourceType)) {
                this.mainPartName = relationships.getRelByType(visioDocumentSourceType).get(0).target;
                this.documentRelationshipsNamespace = new Namespace.Default(Namespace.VISIO_DOCUMENT_RELATIONSHIPS);
            } else {
                throw new OkapiBadFilterInputException(UNSUPPORTED_MAIN_DOCUMENT_PART);
            }
        }
It's likely one of the next 3 commits that caused the issue.

Yves wrote:

OK: commit  d9acde0 is the first one with the error.That's when you switched to use iterators:This code:            final Relationships relationships = relationshipsFor(EMPTY);
            final Iterator<Relationship> relationshipsOfOfficeDocumentSourceTypeIterator =
                relationships.of(Namespace.DOCUMENT_RELATIONSHIPS.concat(OFFICE_DOCUMENT)).iterator();
            final Iterator<Relationship> relationshipsOfStrictOfficeDocumentSourceTypeIterator =
                relationships.of(Namespace.STRICT_DOCUMENT_RELATIONSHIPS.concat(OFFICE_DOCUMENT)).iterator();
            final Iterator<Relationship> relationshipsVisioDocumentSourceTypeIterator =
                relationships.of(Namespace.VISIO_DOCUMENT_RELATIONSHIPS.concat(DOCUMENT)).iterator();
Results in 3 empty iterators. the arguments for of() are:

Yves wrote:

Basically Relationships relationships = relationshipsFor(EMPTY); seems to be the problem. In the latest version it returns an empty relationships. Not in the version that works.
The reason seems to be the available() method. It checks for "_rels\.rels" for me, in the zip file, and returns false.
If we use "_rels/.rels" we get true, even on Windows.The version that worked hard-codes the separator:        String relationshipsPartNameFor(final String part) {
            final int lastSlash = part.lastIndexOf("/");
            if (lastSlash == -1) {
                return "_rels/" + part + ".rels";
            }
            return part.substring(0, lastSlash) + "/_rels" + part.substring(lastSlash) + ".rels";
        }

In the version that doesn't work we use (I think): relationshipsPartPath.toString() which apparently is platform sensitive.I guess the ZipFile.getEntry() normalizes the path to Linux, so we should keep all the OpenXML relationships path normalized too.

Yves wrote:

Note that I think there are several places where ZipFile.getEntry() is used. So you may have to adjust a few places.

Comments (2)

  1. Log in to comment