Error during "Translation Kit Post-Processing" in Rainbow for ICML file

Issue #907 new
Rajendra Kharbuja created an issue

The Rainbow pipeline throws error as shown in the attached image “Error 2020-01-16 at 09.26.02”. The icml file used for translation is also attached.

In both files “Not_working_16.01.20_copy.icml”and “16.01.20.icml” There is an empty <Content></Content> element group outside <Table/> which is causing the problem.

If we remove the empty <Content></Content> element as in “Working_16.01.20_copy.icml”, there will be no error.

Steps to reproduce:

  1. Add the icml file to Input List 1.
  2. Go to Utilities → Translation Kit Creation.
  3. Use “XLIFF v2” as the Package Format in “Rainbow Translation Kit Creation”
  4. Let the rest of the configuration as it is (No srx file in segmentation and no Leveraging) and press “Execute”
  5. Remove the document from the input list
  6. Add the manifest.rkm file created from the step 2 into the Input List 1.
  7. Go to Utilities → Tranlation Kit Post-Processing, and Execute.

Comments (18)

  1. Martn Wunderl

    @YvesS @Mihai Nita What do you think of this issue? What would be the priority and are you aware of anyone else having the same problem perhaps?

  2. Mihai Nita

    I know of the icml filter, and I didn’t have to touch that format, ever.

    But I have also tried

    tikal.sh -x -fc okf_icml Not_working_16.01.20_copy.icml
    tikal.sh -m -fc okf_icml Not_working_16.01.20_copy.icml.xlf
    

    And I also got

    [Fatal Error] :1:2760: The element type "CharacterStyleRange" must be terminated by the matching end-tag "</CharacterStyleRange>".
    Error: Error merging from original file
    Error: Error when parsing XML of text unit id='u43e47-2'.
    The element type "CharacterStyleRange" must be terminated by the matching end-tag "</CharacterStyleRange>".
    You can use the -trace option for more details.
    

    Running it with -trace:

    Trace: thread [pool-1-thread-1] started.
    Trace: XMLInputFactory: com.ctc.wstx.stax.WstxInputFactory
    Trace: BOM not found. Now trying to guess document encoding.
    [Fatal Error] :1:2760: The element type "CharacterStyleRange" must be terminated by the matching end-tag "</CharacterStyleRange>".
    Trace: thread [pool-1-thread-1] closed.
    net.sf.okapi.common.exceptions.OkapiMergeException: Error merging from original file
            at net.sf.okapi.lib.merge.step.OriginalDocumentXliffMergerStep$1.produce(OriginalDocumentXliffMergerStep.java:124)
            at net.sf.okapi.lib.merge.step.OriginalDocumentXliffMergerStep$1.produce(OriginalDocumentXliffMergerStep.java:113)
            at net.sf.okapi.common.io.InputStreamFromOutputStream$DataProducer.call(InputStreamFromOutputStream.java:139)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: net.sf.okapi.common.exceptions.OkapiIOException: Error when parsing XML of text unit id='u43e47-2'.
    The element type "CharacterStyleRange" must be terminated by the matching end-tag "</CharacterStyleRange>".
            at net.sf.okapi.filters.icml.ICMLFilterWriter.mergeTextUnit(ICMLFilterWriter.java:398)
            at net.sf.okapi.filters.icml.ICMLFilterWriter.processTextUnit(ICMLFilterWriter.java:309)
            at net.sf.okapi.filters.icml.ICMLFilterWriter.handleEvent(ICMLFilterWriter.java:189)
            at net.sf.okapi.lib.merge.merge.SkeletonMergerWriter.processTextUnit(SkeletonMergerWriter.java:240)
            at net.sf.okapi.lib.merge.merge.SkeletonMergerWriter.handleEvent(SkeletonMergerWriter.java:103)
            at net.sf.okapi.lib.merge.step.OriginalDocumentXliffMergerStep$1.produce(OriginalDocumentXliffMergerStep.java:120)
            ... 6 more
    Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2760; The element type "CharacterStyleRange" must be terminated by the matching end-tag "</CharacterStyleRange>".
            at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
            at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
            at net.sf.okapi.filters.icml.ICMLFilterWriter.mergeTextUnit(ICMLFilterWriter.java:357)
            ... 11 more
    

    So it is not Rainbow or XLIFF 2 related, seems to be the ICMLFilterWriter merging.

  3. Mihai Nita

    And I’ve found a bug in the process :-)
    The mapping from extension to filter is out of date in Tikal and Rainbow (that’s why I needed to add -fc okf_icml)

  4. Martn Wunderl

    Thanks a lot for taking a look at this, Mihai.

    @Rajendra Kharbuja Not sure, if you’ll have for this next week, but it might be simple enough to fix, if it’s just the mapping. If there is no time, then we can leave for next sprint.

  5. Demian Klc

    Hi all,

    After some research I’ve found out that there is a bug when the ICMLFilter class processes a Content element: if the element has no child nodes, it will append an opening tag with the string <Content/> , and a closing tag with an empty string (““) to the context (see ICMLContext.addContent() method). Looks like doing this leaves an invalid context structure and produces unexpected results later (like closing incorrect tags or leaving tags without a close tag), resulting in the exception above.

    A solution to this might be to use separate open and close tags <Content></Content> instead of a single self closing tag <Content/>. Doing this seems to keep the structure consistent.

    I don’t have permission to create a branch, thats why I’ve created a patch with a fix.

    I’ve tried to add unit tests to this based on current tests but I could’t. Current tests are working and now the exception is gone. The output file looks to be correct.

    Here is the patch. It is based on current dev branch.

    Index: okapi/filters/icml/src/main/java/net/sf/okapi/filters/icml/ICMLContext.java
    IDEA additional info:
    Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
    <+>UTF-8
    ===================================================================
    --- okapi/filters/icml/src/main/java/net/sf/okapi/filters/icml/ICMLContext.java (revision e1b50210ea94c6bb8c067d45431618dfdbe82050)
    +++ okapi/filters/icml/src/main/java/net/sf/okapi/filters/icml/ICMLContext.java (date 1591627237929)
    @@ -136,15 +136,17 @@
         * @param elem the Content element node.
         */
        public void addContent (Element elem) {
    +       String endTag = "</" + elem.getNodeName() + ">";
    +       
            if ( phOnly ) {
    -           tf.append(TagType.PLACEHOLDER, "code", buildStartTag(elem));
    +           tf.append(TagType.PLACEHOLDER, "code", buildStartTag(elem, false));
                ICMLFilter.processContent(elem, tf);
    -           tf.append(TagType.PLACEHOLDER, "code", buildEndTag(elem));
    +           tf.append(TagType.PLACEHOLDER, "code", endTag);
            }
            else {
    -           tf.append(TagType.OPENING, "code", buildStartTag(elem));
    +           tf.append(TagType.OPENING, "code", buildStartTag(elem, false));
                ICMLFilter.processContent(elem, tf);
    -           tf.append(TagType.CLOSING, "code", buildEndTag(elem));
    +           tf.append(TagType.CLOSING, "code", endTag);
            }
            //status++;
     //     contentNode = elem;
    @@ -202,8 +204,15 @@
                }
            }
        }
    -   
    +
    +   /**
    +    * If the tag has no contents, it will return a self closing tag.
    +    */
        public String buildStartTag (Element elem) {
    +       return buildStartTag(elem, true);
    +   }
    +
    +   public String buildStartTag (Element elem, boolean selfClosing) {
            StringBuilder sb = new StringBuilder("<"+elem.getNodeName());
            NamedNodeMap attrNames = elem.getAttributes();
            for ( int i=0; i<attrNames.getLength(); i++ ) {
    @@ -213,7 +222,7 @@
                sb.append("\"");
            }
            // Make it an empty element if possible
    -       if ( elem.hasChildNodes() ) {
    +       if ( elem.hasChildNodes() || !selfClosing) {
                sb.append(">");
            }
            else {
    

  6. Martn Wunderl

    @YvesS We will try to fix this in our current sprint (i.e. before next week Friday). Do we already have a date for the next Okapi release, so that we can include the bug fix there?

  7. YvesS

    Mihai, Jim and Chase are talking about doing a “dot release” soon, but I don’t know when exactly. @Mihai Nita : do you have fixed a date?

  8. Mihai Nita

    No fix date. I suppose "ASAP after the PR is merged"

    But it looks like the idea is not to release everything under 1.39.1, only one artifact. This one also looks pretty isolated (I doubt that there are other components depending on ICML) Do we already have a PR for this one?

    Mihai

  9. Martn Wunderl

    @Demian Klc I have created a branch for this issue now: Issue_907_Error_missing_endtag_during_Translation Kit Post-Processing_for_ICML
    When you have a moment, could you try and commit your changes to that branch and create the PR?

  10. Demian Klc

    @YvesS , looks like I don’t have permission to push changes. Would you mind adding permissions to push?

    Thanks!

    Demian

  11. Demian Klc

    Hi all,

    Still I can’t find the reason behind the failing integration test, I’ve created this MR to revert my fix until finding the problem:

    https://bitbucket.org/okapiframework/okapi/pull-requests/413/revert-907-icml-filter-fix-for-empty

    I’ve assumed that this fix was good because the pipeline was successful, but I was not aware that the integration tests were failing because of this.

    We might need help to know how to run integration tests locally so that we can create a fix.

    The reason of the failing test is not clear to me (see failed test here).

    I’ve performed two tests using the ICML file used for the integration test: I’ve processed them using the process described to reproduce the issue. One test was executed with the code fix and other test without the code fix.

    The results:

    • produced ICML files (/ack1/done folder) are identical
    • a small difference between both XLF files (/ack1/done folder): test with fix has an ocurrence of Content></Content> instead of <Content/>

    The only thing I’ve found strange was that the original ICML contains occurrences of <Content><?ACE 4?></Content> , not sure what it means.

  12. Denis Konovalyenko

    @Demian Klc , let me add my two cents.

    It is eligible to fix the failing integration tests by adjusting the content of the gold files (XLF in this case), however, the best person for doing so is probably always the one who has made the related changes in the Okapi repo. Hence, I think it would be alright if you open a pull request in the Okapi Integration Tests repo to cover this.

    As for the doubts on the <?ACE 4?> content, this is a processing instruction and it is completely typical to have it like this.

  13. Demian Klc

    I see. Thank you @Denis Konovalyenko for your comments.

    Based on the differences mentioned in my last comment, I think I found where to update the related XLF file.

    I wanted to push changes but I don’t have permission to do it. @YvesS , can you provide me access?

    An additional question: after I push changes on a branch, will these integration tests run from this branch to check if I fixed the issue?

    Thanks!

  14. Log in to comment