Receive TimeoutException in InputStreamFromOutputStream, which Okapi logs but does not rethrow.

Issue #1171 resolved
Marco Anaya Valdovinos created an issue

I am working with Okapi 0.36.3 in my AWS ECS service and we can get the following exception a number of times when we run out of memory on our ECS task:

2022-10-28T01:14:30.201Z    ERROR [QuartzScheduler_Worker-19] (InputStreamFromOutputStream.java:292) - This timeout should never happen, the thread should terminate correctly. Please contact io-tools support.
2022-10-28T01:14:30.201Z    java.util.concurrent.TimeoutException
2022-10-28T01:14:30.201Z        at java.util.concurrent.FutureTask.get(FutureTask.java:205)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.io.InputStreamFromOutputStream.checkException(InputStreamFromOutputStream.java:275)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.io.InputStreamFromOutputStream.read(InputStreamFromOutputStream.java:392)
2022-10-28T01:14:30.201Z        at java.io.PipedInputStream.read(PipedInputStream.java:377)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.io.InputStreamFromOutputStream.read(InputStreamFromOutputStream.java:402)
2022-10-28T01:14:30.201Z        at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.StreamUtil.copy(StreamUtil.java:138)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.StreamUtil.copy(StreamUtil.java:121)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.StreamUtil.createResettableStream(StreamUtil.java:279)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.resource.RawDocument.createResettableStream(RawDocument.java:346)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.resource.RawDocument.create(RawDocument.java:255)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.resource.RawDocument.<init>(RawDocument.java:158)
2022-10-28T01:14:30.201Z        at com.amazon.atms.docconverter.okapi.steps.AsinMergeStep.handleRawDocument(AsinMergeStep.java:115)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.pipeline.BasePipelineStep.handleEvent(BasePipelineStep.java:76)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.pipeline.Pipeline.execute(Pipeline.java:119)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:231)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.pipeline.Pipeline.process(Pipeline.java:201)
2022-10-28T01:14:30.201Z        at net.sf.okapi.common.pipelinedriver.PipelineDriver.processBatch(PipelineDriver.java:182)
2022-10-28T01:14:30.201Z        at mypackage.operation.MergeOperation.run(MergeOperation.java:74)

The corresponding block of code is:

/**
  * Builds the Okapi pipeline driver to perform this merge operation
  */
public IPipelineDriver buildDriver() {
    IPipelineDriver driver = new PipelineDriver();
    driver.setFilterConfigurationMapper(filterConfigurationMapper);
    driver.addStep(documentMergeStep);
    driver.addStep(new RawDocumentWriterStep());

    BatchItemContext batchItemContext = new BatchItemContext(xliffDocument, new File(outputPath).toURI(), Charset.forName("UTF-8").name(), sourceDocument);
    driver.addBatchItem(batchItemContext);

    return driver;
}

/**
  * Builds the Okapi pipeline driver and run the merge operation
  */
@Override
public void run() throws DocConverterException{
    try {
        buildDriver().processBatch(); // ***line 74***
    } catch (OkapiException e) {
        throw new DocConverterException("Error running merge operation", e);
    }
}

The issue is that this exception is only logged but not thrown https://bitbucket.org/okapiframework/okapi/src/23d916f011d51d055115fd248d8ce890d41ccaae/okapi/core/src/main/java/net/sf/okapi/common/io/InputStreamFromOutputStream.java#lines-292:294

So once our service recovers from the OOM, it has no way of catching that this error happened, and our file conversion is truncated. For example, when converting from XLIFF back to JSON, our file would get truncated to:

{
  "Items": [
    {
      "item": 1,
      ...
    },
    ...
    {
      "item": n,
      ...
    }
// A number of items are missing at the end and the JSON isn't closed with "]}"

Questions for the devs:

  1. I understand the exception mentions that it should not happen and that io-tools should be contacted, but is there a way that this exception can be thrown in Okapi? That way, we will have some way to catch it and it will not occur silently?
  2. Is there any other way we could prevent this TimeoutException from causing our conversions to be truncated (e.g. a different way of calling this)?

Thank you and please let me know if there is additional information I can provide.

Comments (4)

  1. jhargrave-straker

    Hi Marco,

    We have seen this bug as well. If memory serves there was some refactoring of OriginalDocumentXliffMergerStep to better recover from the TimeoutException

    The Okapi version you are using is quite old. Can you run a test with the latest version (1.44.0)? If you are using your own fork you can try to use the latest OriginalDocumentXliffMergerStep or copy and paste the relevant code to your merger step.

    Note that I have updated the code to rethrow the TimeoutException and we will test with that update that will be available in 1.45.0

    Your best bet for a workaround is to recode OriginalDocumentXliffMergerStep so that it does not use InputStreamFromOutputStream . The code will be slower and require disk space to write out a temp file - but you will have full control over any exceptions.

  2. Marco Anaya Valdovinos reporter

    Hi Jim!

    Thanks for the quick response! We will investigate on our end whether it is easier to recode OriginalDocumentXliffMergerStep or update to the latest version. Since our Okapi version is pretty old and the service it is in is used by various teams, testing on the latest version might take some effort.

  3. Log in to comment