Straight quote in Excel file crashes OmegaT project with java.lang.ClassException

Issue #47 new
Manuel Souto Pico created an issue

Steps to reproduce

Translate Excel file with text “How well do you know this person's performance?” (for example)

Expected results

The filter extracts the text and project loads normally.

Actual results

Project crashes with error “java.lang.ClassException: com.sun.xml.internal.stream.events.CharacterEvent cannot be cast to javax.xml.stream.events.EndElement”

See screenshot https://imgur.com/2eAmPuA.png, or below:

Further info

The problem seems to be the straight quote. I get the expected results if I replace it with a curly apostrophe: “How well do you know this person’s performance?“

Files

I am attaching an OmegaT project that includes two files, one with straight quote and one with curly quote / apostrophe.

Comments (6)

  1. Chase Tingley

    I attached the fail.xlsx file and the omegat fprm directly for convenience.

    However, when I try to extract through just Okapi, I don’t see a crash (although I do see a couple warnings that seem unrelated – they are present in both the good and bad versions of the file, and are related to the structure of the docx archive):

    $ tikal.sh -fc ../omegat/okf_openxml@noauthor.fprm -x fail.xlsx 
    Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
    -------------------------------------------------------------------------------
    Okapi Tikal - Localization Toolset
    Version: 2.1.44.0-SNAPSHOT
    -------------------------------------------------------------------------------
    Extraction
    Source language: en
    Target language: fr
    Default input encoding: UTF-8
    Filter configuration: okf_openxml@noauthor
    Output: /home/tingley/Downloads/omegat/source/fail.xlsx.xlf
    Input: /home/tingley/Downloads/omegat/source/fail.xlsx
    Unable to resolve '../customXml/item1.xml' against path ''.
    Unable to resolve '../customXml/item2.xml' against path ''.
    Unable to resolve '../customXml/item3.xml' against path ''.
    Done in 0.645s
    

  2. Kuro Kurosaka (BH Lab)

    This issue is definitely related to, and probably has the same root cause with, issue #38. The cell contents should be treated as just a text element, should not be parsed further as XML, and should not cause an exception.

    I inserted this test case (well, not really a test case since there’s no assert) to Okapi OmegaT plugin’s net.sf.okapi.lib.omegat.AbstractOkapiFilterTest and this runs normally.

    @Test
    public void testXlsx () throws Exception {
        org.omegat.filters2.IFilter filter = new OpenXMLFilter();
        VirtualOmegaT omegat = new VirtualOmegaT();
        File inFile = new File(getClass().getResource("/fail.xlsx").toURI());
        filter.parseFile(inFile, null, new FilterContext(), omegat);
    }
    

    The exception only happens when the plugin is used from OmegaT.
    It is difficult to debug because three separate projects are involved, OmegaT, the plugin, and Okapi. (Any suggestion how to do this on Intellij?)

  3. Manuel Souto Pico reporter

    Thank you guys for looking into this.

    @Chase Tingley I was using a customized variant of the plugin (version okapiFiltersForOmegaT-1.12-1.44.0) based on commit c5fa867. I need this customization to make the plugin compatible with Java 8, which is what OmegaT (including JRE) supports at the moment.

    I have just tested it with the latest binary available (version okapiFiltersForOmegaT-1.11-1.43.0.jar), running OmegaT with Java 11 from the command line, and I can reproduce it. The error message is not exactly the same, though:

    Version: OmegaT-5.7.1_0_c3206253
    Platform: Linux 5.16.0-5mx-amd64
    Java: 11.0.12 amd64
    Memory: 598MiB total / 482MiB free / 5960MiB max

    java -version
    openjdk version "11.0.12" 2021-07-20
    OpenJDK Runtime Environment 18.9 (build 11.0.12+7)
    OpenJDK 64-Bit Server VM 18.9 (build 11.0.12+7, mixed mode)

    I hope that helps.

  4. Log in to comment