OpenXML filter leaks files on encryption check.
On encryption check for OpenXML:
try (InputStream is = new FileInputStream(file)) {
new CompoundDocument(is);
return true;
} catch (CorruptDocumentException e) {
return false;
}
On CompoundDocument init a temp file is created that is:
- Deleted on exit leaving the file present for the lifetime of the application
- A RandomAccessFile instance used to open the temp file is never closed leaving the fd open for the lifetime of the application.
java.io.RandomAccessFile.<init>(File, String) RandomAccessFile.java:243
com.twelvemonkeys.io.FileCacheSeekableStream$FileCache.<init>(File) FileCacheSeekableStream.java:200
com.twelvemonkeys.io.FileCacheSeekableStream.<init>(InputStream, File) FileCacheSeekableStream.java:109
com.twelvemonkeys.io.FileCacheSeekableStream.<init>(InputStream, String, File) FileCacheSeekableStream.java:95
com.twelvemonkeys.io.FileCacheSeekableStream.<init>(InputStream) FileCacheSeekableStream.java:63
com.twelvemonkeys.io.ole2.CompoundDocument.<init>(InputStream) CompoundDocument.java:119
net.sf.okapi.filters.openxml.OpenXMLFilter.isZipFileEncrypted(File) OpenXMLFilter.java:458
net.sf.okapi.filters.openxml.OpenXMLFilter.openZipFile() OpenXMLFilter.java:412
net.sf.okapi.filters.openxml.OpenXMLFilter.next() OpenXMLFilter.java:260
net.sf.okapi.steps.common.RawDocumentToFilterEventsStep.handleEvent(Event) RawDocumentToFilterEventsStep.java:135
net.sf.okapi.common.pipeline.Pipeline.execute(Event) Pipeline.java:119
net.sf.okapi.common.pipeline.Pipeline.process(Event) Pipeline.java:231
net.sf.okapi.common.pipeline.Pipeline.process(RawDocument) Pipeline.java:201
This is a library bug in TwelveMonkeys common-io for sure, just wanted to report it here as it has a major impact on okapi.
Comments (6)
-
reporter -
Oh, great catch. I think this may also be the file descriptor leak I've noticed in Longhorn too.
-
Filed this. I may try to push a PR through if I have time. https://github.com/haraldk/TwelveMonkeys/issues/438
-
- changed status to resolved
Fix issue
#743- file descriptor leak checking for encrypted OOXML→ <<cset e4d31a6b3ae0>>
-
- changed milestone to M37
Harald, who maintains the twelvemonkeys project, was very helpful and suggested a workaround while he fixes the leaks. I've pushed the change.
-
reporter Thank you for the quick fix!
I can confirm that on latest dev the encryption check doesn't leak temp files or file descriptors.
- Log in to comment
Reverting to Apache's POI fixes the problem and no FDs are leaked from what I can see with a profiler.
So pretty much a revert of this commit 437619a20b81528f91d808cd658bf335217464e9 (with conflict fixes and version update) would solve the problem, but I don't know if this is a good solution or it's better to work with TwelveMonkeys upstream to solve their file management issues.