XLIFF2 FilterWriter throws error when codes are out of order

Issue #1094 new
Former user created an issue

When a TextFragment object has codes in its codes variable, and their order in the list is not the same order that the codes are in the content, the codes that come earlier than they should are not added. Then, when the XLIFF2 Toolkit tries to lookup a code in the string, it doesn't find it, resulting in an exception.

This can be easily fixed by a developer by simply sorting the codes list before sending the TextUnit events to the FilterWriter. So it's not a high priority.

e.g. TextFragment content: Hello {codeA}my{codeB} {codeC}! codes: [codeA, codeC, codeB]

The XLIFF2 Filter will only add CodeA and CodeC. Then, the XLIFF2 toolkit will throw an exception because it will look up CodeB, and it will get a null value which it won't check for. Then it will throw a NullPointerException.

java.lang.NullPointerException at net.sf.okapi.lib.xliff2.core.Fragment.toXLIFF(Fragment.java:733) at net.sf.okapi.lib.xliff2.writer.XLIFFWriter.writeFragment(XLIFFWriter.java:1088) at net.sf.okapi.lib.xliff2.writer.XLIFFWriter.writeUnit(XLIFFWriter.java:422) at net.sf.okapi.lib.xliff2.writer.XLIFFWriter.writeEvent(XLIFFWriter.java:289) at net.sf.okapi.filters.xliff2.XLIFF2FilterWriter.handleEvent(XLIFF2FilterWriter.java:110)

Location where code is failed to be added by XLIFF2 Filter: https://bitbucket.org/okapiframework/okapi/src/bec7d2715910bf7d6259ef23480fa765e6f2b64a/okapi/filters/xliff2/src/main/java/net/sf/okapi/filters/xliff2/OkpToX2Converter.java#lines-371 Location error occurs: https://bitbucket.org/okapiframework/okapi/src/bec7d2715910bf7d6259ef23480fa765e6f2b64a/okapi/libraries/lib-xliff2/src/main/java/net/sf/okapi/lib/xliff2/core/Fragment.java#lines-733

Comments (3)

  1. Jack Cole

    Here is how I resorted the content in the TextUnit in my application before sending to the FilterWrite. The error no longer appears, confirming that this is the issue. I apply this function to all segments in the Source and Target content.

    private void fixCodeOrderBug(Segment segment) {
        final List<Code> codes = segment.getContent().getCodes();
        final Map<Code, Integer> codesToPosition = new HashMap<>();
        for (int i = 0; i < codes.size(); i++) {
            codesToPosition.put(codes.get(i), segment.getContent().getCodePosition(i));
        }
        // If the codes are out of order, remove and add them back in the proper order.
        if(!Comparators.isInOrder(codes, Comparator.comparing(codesToPosition::get))) {
            final List<Code> sorted = codes.stream().sorted(Comparator.comparing(codesToPosition::get)).collect(Collectors.toList());
            for (Code code : sorted) {
                segment.getContent().removeCode(code);
            }
            for (Code code : sorted) {
                segment.getContent().insert(codesToPosition.get(code), code);
            }
        }
    }
    

  2. jhargrave-straker

    Do you know under what conditions the codes in TextFragment would get out of order? Just trying to understand better how to recreate this issue and fix it. I could do a post-processing resort as you do - but would rather fix the underlying problem.

  3. Log in to comment