XLIFF2 Filter: crashes parsing any file containing a group

Issue #697 resolved
Chase Tingley created an issue

Pretty bad bug: we crash extracting any XLIFF2 file that contains a <group> element, which is most files.

I can reproduce this with the test01.xlf file which is part of the unittest data for the filter (!), and I've also attached. (The unittest that uses it just verifies that a particular TU is extracted, but does not complete parsing -- since the bug occurs on END_GROUP events, the test doesn't hit it.)

The issue is that the filter is trying to call into xliff-toolkit serialization methods to write out certain elements as skeleton, but those methods depend on internal state (eg the group stack) which isn't being tracked properly when called from the filter. xliff-toolkit's XLIFFReader class has no problem handling this file.

Stack trace looks like this:

java.util.EmptyStackException
    at java.util.Stack.peek(Stack.java:102)
    at java.util.Stack.pop(Stack.java:84)
    at net.sf.okapi.lib.xliff2.writer.XLIFFWriter.writeEndGroup(XLIFFWriter.java:1058)
    at net.sf.okapi.filters.xliff2.XLIFF2Filter.convEndGroup(XLIFF2Filter.java:338)
    at net.sf.okapi.filters.xliff2.XLIFF2Filter.readNext(XLIFF2Filter.java:249)
    at net.sf.okapi.filters.xliff2.XLIFF2Filter.next(XLIFF2Filter.java:185)

Comments (8)

  1. Chase Tingley reporter

    I have a kind of clunky fix that seems to work; however I'm also seeing a crash when I try to write this to XLIFF 1.2 using tikal that may indicate that my fix is bad. I'll post an update in a bit.

  2. Chase Tingley reporter

    The tikal crash (which can be reproduced in test by collecting XLIFF2Filter events and then dumping them into an XLIFFWriter) is due to the way the XLIFF2 filter maps group elements to subdocument events. The </group> is therefore handled as end_subdocument by the XLIFFWriter, which closes both the </group> and the current </file>. This causes the element stack to eventually bottom out when we go to end the file.

  3. Chase Tingley reporter

    I think I've got the fixes here (if you can review the PR this weekend, that would be great). I was initially nervous about the way I fixed the first issue, as there was a very old line of similar code that was commented out for some reason. But I can't get it to break anything else.

  4. Log in to comment