OpenXML filter leaks files on encryption check.

Issue #743 resolved
Nikolai Vladimirov created an issue

On encryption check for OpenXML:

try (InputStream is = new FileInputStream(file)) {
    new CompoundDocument(is);
    return true;
} catch (CorruptDocumentException e) {
       return false;
}

On CompoundDocument init a temp file is created that is:

  • Deleted on exit leaving the file present for the lifetime of the application

https://github.com/haraldk/TwelveMonkeys/blob/24c6682236e5a02151359486aa4075ddc5ab1534/common/common-io/src/main/java/com/twelvemonkeys/io/FileCacheSeekableStream.java#L104

  • A RandomAccessFile instance used to open the temp file is never closed leaving the fd open for the lifetime of the application.

https://github.com/haraldk/TwelveMonkeys/blob/24c6682236e5a02151359486aa4075ddc5ab1534/common/common-io/src/main/java/com/twelvemonkeys/io/FileCacheSeekableStream.java#L202

java.io.RandomAccessFile.<init>(File, String) RandomAccessFile.java:243
com.twelvemonkeys.io.FileCacheSeekableStream$FileCache.<init>(File) FileCacheSeekableStream.java:200
com.twelvemonkeys.io.FileCacheSeekableStream.<init>(InputStream, File) FileCacheSeekableStream.java:109
com.twelvemonkeys.io.FileCacheSeekableStream.<init>(InputStream, String, File) FileCacheSeekableStream.java:95
com.twelvemonkeys.io.FileCacheSeekableStream.<init>(InputStream) FileCacheSeekableStream.java:63
com.twelvemonkeys.io.ole2.CompoundDocument.<init>(InputStream) CompoundDocument.java:119
net.sf.okapi.filters.openxml.OpenXMLFilter.isZipFileEncrypted(File) OpenXMLFilter.java:458
net.sf.okapi.filters.openxml.OpenXMLFilter.openZipFile() OpenXMLFilter.java:412
net.sf.okapi.filters.openxml.OpenXMLFilter.next() OpenXMLFilter.java:260
net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(Event) RawDocumentToFilterEventsStep.java:135
net.sf.okapi.common.pipeline.Pipeline.execute(Event) Pipeline.java:119
net.sf.okapi.common.pipeline.Pipeline.process(Event) Pipeline.java:231
net.sf.okapi.common.pipeline.Pipeline.process(RawDocument) Pipeline.java:201

This is a library bug in TwelveMonkeys common-io for sure, just wanted to report it here as it has a major impact on okapi.

Comments (6)

  1. Nikolai Vladimirov reporter

    Reverting to Apache's POI fixes the problem and no FDs are leaked from what I can see with a profiler.

    So pretty much a revert of this commit 437619a20b81528f91d808cd658bf335217464e9 (with conflict fixes and version update) would solve the problem, but I don't know if this is a good solution or it's better to work with TwelveMonkeys upstream to solve their file management issues.

  2. Chase Tingley

    Oh, great catch. I think this may also be the file descriptor leak I've noticed in Longhorn too.

  3. Chase Tingley
    • changed milestone to M37

    Harald, who maintains the twelvemonkeys project, was very helpful and suggested a workaround while he fixes the leaks. I've pushed the change.

  4. Nikolai Vladimirov reporter

    Thank you for the quick fix!

    I can confirm that on latest dev the encryption check doesn't leak temp files or file descriptors.

  5. Log in to comment