Rainbow ExtractionStep causes ClassNotFoundException when trying to instantiate writer object

Issue #495 resolved
Martn Wunderl created an issue

Hi,

This relates to a message posted on the dev list: https://groups.google.com/forum/#!topic/okapi-devel/Gn9UOK-Wc0c

  • Steps to reproduce: currently, only reproducible in our system, where a pipeline is assembled programmtically
  • Setup: simple 3-step pipeline with ICML filter, SRX segmentation and creation of package (ExtractionStep, wiht XLIFF 2.0); running M28-SNAPSHOT, Java 1.8.0_31_b13, Mac OS X
  • Outcome: ClassNotFoundException on line 188 ExtractionStep
  • Expected results: Create XLIFF package from ICML.

Root cause analysis:

The issue seems to be that the ExtractionStep is using the default class loader when trying to instantiate the writer object:

    protected Event handleStartBatch (Event event) {
        try {
            // Get the package format (class name)
            String writerClass = params.getWriterClass();
            writer = (IPackageWriter)Class.forName(writerClass).newInstance();
            writer.setParameters(params);

This is problematic, because when instantiating a pipeline step it is possible to set the classloader attribute on the Step object. However, I presume the step should then also use the same classloader to create further objects and not the default.

If this analysis is indeed correct, then it would require a number of changes: - add a field "loader" to the abstract BasePipelineStep - add a method setLoader() to IPipelineStep - PipelineWrapper would need a field for a class loader, too (and possibly an additional constructor or setter where this loader could be set) so that the attribute "loader" on availableSteps can get set to this class loader. - when creating pipeline steps using a classloader, e.g. in PipelineWrapper.copyInfoStepsToPipeline(), set the same classloader as an attribute on the step - in the specific step class, when instantiating objects, then the loader of the step instance should be used.

Note that this might affect several areas in the library. At least every Step would need to be examined and every place where steps are created. I am not sure how much this really affects.

Then again, this analysis might be missing something important, so feel free to correct me.

For us, this is a major issue, because it currently makes it impossible to use the library. Maybe a fix might be possible in the upcoming M28 release. Alternatively, perhaps a branch could be created for the issue so that we can submit the required changes and see, if everything works with a patched version of Okapi (I'd rather do it through a branch in the repo than using a local build, to things in synch).

Cheers,

Martin

Comments (11)

  1. Martn Wunderl reporter

    Just a few more observations on this: - I forgot to mention that I was to work around this issue by creating adjusted versions of PipelineWrapper and ExtractionStep and use these instead of the ones provided by the Okapi Jars.

    • Placing the Okapi Jars in the JDK's "endorsed" directory did not fix the problem.

    • When I examine the classloader of the ExtractionStep object at the point where it tries to instantiate the writer, there are some odd results. I can see that the classloader knows about the following classes from the package net.sf.okapi.steps.rainbowkit from okapi-lib-0.28-SNAPSHOT.jar (by examining the field "classes" in this.getClass().getClassLoader()):

    1. net.sf.okapi.steps.rainbowkit.creation.ExtractionStep
    2. net.sf.okapi.steps.rainbowkit.creation.Parameters
    3. net.sf.okapi.steps.rainbowkit.postprocess.MergingStep
    4. net.sf.okapi.steps.rainbowkit.postprocess.Parameters
    5. net.sf.okapi.steps.rainbowkit.creation.ExtractionStep$1

    This seems odd, because when I examine the package contents of the Okapi jar, I get the following (using jar tvf okapi-lib-0.28-SNAPSHOT.jar | grep net/sf/okapi/steps/rainbowkit/.*class):

    1. 1672 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/common/BasePackageWriter$1.class
    2. 17537 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/common/BasePackageWriter.class
    3. 216 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/common/IMergeable.class
    4. 786 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/common/IPackageWriter.class
    5. 4438 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/common/WordCounter.class
    6. 952 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/creation/ExtractionStep$1.class
    7. 11878 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/creation/ExtractionStep.class
    8. 4052 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/creation/Parameters.class
    9. 8704 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/omegat/OmegaTPackageWriter.class
    10. 3749 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/omegat/Options.class
    11. 3176 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/ontram/OntramPackageWriter.class
    12. 3801 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/po/POPackageWriter.class
    13. 875 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/postprocess/Merger$1.class
    14. 12736 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/postprocess/Merger.class
    15. 764 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/postprocess/MergingStep$1.class
    16. 5277 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/postprocess/MergingStep.class
    17. 3678 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/postprocess/Parameters.class
    18. 965 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/postprocess/SkeletonMerger$1.class
    19. 12382 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/postprocess/SkeletonMerger.class
    20. 1335 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/rtf/RTFLayerWriter$1.class
    21. 6827 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/rtf/RTFLayerWriter.class
    22. 3824 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/rtf/RTFPackageWriter.class
    23. 3936 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/table/TablePackageWriter.class
    24. 2008 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/transifex/Parameters.class
    25. 11030 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/transifex/TransifexPackageWriter.class
    26. 3606 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/versified/VersifiedPackageWriter.class
    27. 3159 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/versified/VersifiedRtfPackageWriter.class
    28. 4709 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/xliff/Options.class
    29. 2318 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/xliff/XLIFF2Options.class
    30. 2529 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/xliff/XLIFF2PackageWriter$1.class
    31. 21475 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/xliff/XLIFF2PackageWriter.class
    32. 1982 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/xliff/XLIFFPackageWriter$1.class
    33. 7570 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/xliff/XLIFFPackageWriter.class
    34. 3081 Mon Aug 03 09:02:56 CEST 2015 net/sf/okapi/steps/rainbowkit/xliffrtf/XLIFFRTFPackageWriter.class

    This seems odd, because it looks as if the package contents are only partially loaded, which would contradict what I wrote in the original issue description. It should be either all or nothing. But having only 5 class from that package visible in the classloader at that stage doesn't make any sense to me. But maybe it helps in trying to diagnose the problem.

    Cheers,

    Martin

  2. YvesS

    I still had no time to look closely at this, but you may want to look at example06: it does pretty much what you try here (except using Maven to pull the dependencies).

    BTW: the pom in example06 needs to be updated to have the version:

    <dependency>
     <groupId>net.sf.okapi.lib</groupId>
     <artifactId>okapi-lib-xliff2</artifactId>
    </dependency>
    

    should be:

    <dependency>
     <groupId>net.sf.okapi.lib</groupId>
     <artifactId>okapi-lib-xliff2</artifactId>
     <version>1.0.1</version>
    </dependency>
    
  3. Martn Wunderl reporter

    And one more finding: If I create a simple static Main class from which to run the same code, I can execute the Okapi pipeline without any problems. It fails when the pipeline is built and executed by a service module from within the running censhare server.

  4. Martn Wunderl reporter

    Hi Yves et al.,

    I have just discussed this with my boss who is more knowledgable about Classloader issues than I am. The issue might be that the ExtractionStep (and any other steps that try to instantiate additional classes) is using the wrong class loader. Instead of the default, it should be using the context class loader of the current thread.

    So, instead of this (in ExtractionStep.handleStartBatch()):

    String writerClass = params.getWriterClass();
    writer = (IPackageWriter) Class.forName(writerClass).newInstance();
    

    It should be doing this:

    Thread currThread = Thread.currentThread();
    ClassLoader ccl = currThread.getContextClassLoader();
    String writerClass = params.getWriterClass();
    writer = (IPackageWriter) Class.forName(writerClass, true /* initialize */, ccl).newInstance();
    

    Another option might be to do:

    ClassLoader ccl = this.getClass().getClassLoader(). 
    

    Does that make sense?

    Cheers,

    Martin

  5. Jim Hargrave (OLD)

    Just a thought. I have working extraction/merge pipelines that do not use ExractionStep. However my pipelines are hard coded to produce xliff only. Not any of the Rainbow kits. I don't know if that is an option for you.

    This classloader magic always worries me. I think we should avoid it at all costs. If we are going to refactor I would like to push any classloader code into a single class. Just trying to avoid doing it across the framework. My concern is running in server environments, memory leaks and other nasties.

  6. Martn Wunderl reporter

    Hi Jim,

    Thanks a lot for the comment. Would those pipelines of yours be able to go from ICML to XLIFF 2.0 (via SRX segmentation and leveraging) and back again? If so, what are the specific pipeline steps you're using?

    As for the classloader issues, I agree. There should be a single static utility class or something like that, which can be used to instantiate objects within the correct classloader context (assuming my root cause analysis is correct). There is an interesting approach described here using strategy pattern: http://www.javaworld.com/article/2077344/core-java/find-a-way-out-of-the-classloader-maze.html

    Cheers,

    Martin

  7. Jim Hargrave

    I use XLIFFWriter with FilterEventsWriterStep. I'm not sure if we have an "Xliff2Writer" that implements IFilterWriter. If we do then you should be able to use my "lower level pipeline". I always try to create my pipelines at the lowest level possible using the core steps. Fewer problems and easier to debug issues.

    I like the classloader link - this looks like the best solution for a library where you never know how it will be used.

  8. Martn Wunderl reporter

    @jhargrave If I find the time today, I will try to create a local Okapi build that includes the change mentioned in my last post, so that I can test it. How would I go about creating a local build of the libraries?

  9. Jim Hargrave

    Martin - I've got a high priority task I over the next two weeks. I'll be actively ignoring emails during that time, but wanted to throw you a bone at least :-)

    But basically to build Okapi you need java 1.7 (you can build with 1.8 but you won't be able to use any 1.8 language features), maven and ant. Pull the dev branch from the git repository.

    Then you can go to ../deployment/maven and execute one of the "update" scripts that match your OS (windows, Mac or Linux). That should give you a complete distribution. But if you are using maven in your project you can just use your local okapi snapshot artifacts - for that just do the normal "mvn clean install" from the okapi root folder.

    If you get stuck maybe one of the other devs can help you out. I'll try to take a peek next week to see how you are doing.

    cheers,

    Jim

  10. Martn Wunderl reporter

    OK, great, thanks for the hint, Jim, and for taking the time to reply. I will try and see, if I can get it running using a local build with the line modified to use the context class loader.

    Cheers,

    Martin

  11. Martn Wunderl reporter

    In the end, it turned out that this problem was actually caused by erroneous whitespace characters in the pipeline configuration file, which lead to the ClassNotFoundException. So, not related to the class loader at all.

  12. Log in to comment