XLIFF2FilterWriter MetadataSkeleton compatibility with filters using GenericSkeleton

Issue #1347 open
Marco Anaya Valdovinos created an issue

Hello,

My team has been aiming to upgrade our Okapi version to the latest and consolidate some of our existing homegrown XLIFF2 code with what is existing in Okapi.

However, we find some incompatibilities when reading a file using existing filters (e.g. JSON, HTML, etc) and then writing as XLIFF2. Specifically, we get the following error:

java.lang.ClassCastException: class net.sf.okapi.common.skeleton.GenericSkeleton cannot be cast to class net.sf.okapi.filters.xliff2.MetadataSkeleton (net.sf.okapi.common.skeleton.GenericSkeleton and net.sf.okapi.filters.xliff2.MetadataSkeleton are in unnamed module of loader 'app')

This is referenced in https://groups.google.com/g/okapi-users/c/gPJOBNHCib4/m/3jtvie0iAwAJ. However, in our case I believe we can’t use an XLIFF2 compatible Skeleton as GenericSkeleton is explicitly written in the various filters.

The following is a simplified version of our code:

  public IPipelineDriver buildDriver() {
      IPipelineDriver driver = new PipelineDriver();

      RawDocumentToFilterEventsStep rawDocumentToFilterEventsStep = new RawDocumentToFilterEventsStep();
      rawDocumentToFilterEventsStep.setFilter(new JSONFilter()); // or most other filter like HTML5Filter()
      driver.addStep(rawDocumentToFilterEventsStep);
      FilterEventsWriterStep filterEventsWriterStep = new FilterEventsWriterStep();
      filterEventsWriterStep.setFilterWriter(new XLIFF2FilterWriter());
      driver.addStep(filterEventsWriterStep);

      driver.addBatchItem(rawDocument, new File(outputPath).toURI(), StandardCharsets.UTF_8.name());
      return driver;
  }

Note the following cases do work:

  • I switch out JSONFilter for XLIFF2Filter (which uses MetadataSkeleton)
  • I switch out XLIFF2FilterWriter for net.sf.okapi.common.filterwriter.XLIFFWriter
  • I switch out XLIFF2FilterWriter for my homegrown XLIFF2 filter writer.

I understand that XLIFF2 in Okapi is a WIP today and isn’t in its current state expected to work at the level of the existing XLIFF1.2 stuff. With that in mind, I ask the following questions:

  1. Is my understanding of the context of XLIFF2 support right (i.e. XLIFF2FilterWriter is supposed to be at some point usable for the purpose described above, I am referencing the relevant code and using it properly, etc)?
  2. Is this a (relatively) easily addressable bug?
  3. If not, what would it take to support this use case in XLIFF2? What context would be necessary to contribute to this?

Comments (8)

  1. jhargrave-straker

    XLIFF2FilterWriter is normally meant to be used only by the Xliff2Filter during merge (post-translation). You are right that the current Xliff2FilterWriter is close to a general writer. I don't think it would take too much to enhance the current Xliff2FilterWriter to be used as a general writer.

    I don’t think this is a bug. ISkeleton has may different implementations and all should be handled - never assume GenericSkeleton.

    Rainbow does have a partial implementation of a general xliff2 writer (that's the one marked as beta). Not sure that is a good starting point.

    In summary if you want to have a IFilterWriter that can take events from any filter and output xliff2 - that is something that is on the priority list that we just haven't gotten to yet.

    Sorry for the late edits was having issues with my keyboard making it hard to type a coherent response.

  2. jhargrave-straker

    I think XLIFF2FilterWriter should be package private to drive home the fact it is not designed to be used as a general writer like TmxWriter etc..

  3. Marco Anaya Valdovinos reporter

    Thanks for the thorough response, Jim! The situation as you describe it makes sense.

    1. Is the partial beta implementation of the XLIFF2Writer okapi/libraries/lib-xliff2/src/main/java/net/sf/okapi/lib/xliff2/writer/XLIFFWriter.java? That is what my team has currently found that seems closest to this, although we aren’t the most familiar with the repo (please correct if wrong).
    2. What are the differences between that partial implementation and what would be a generic XLIFF2Writer?
    3. What would the timeline potentially look like for a generic XLIFF2Writer? Is there anything I could help with or contribute to in order for this to be implemented?

  4. jhargrave-straker

    (1) Actually that is the raw xliff2 write in the lib. It’s not beta. The one I refer to is in Rainbow: src/main/java/net/sf/okapi/steps/rainbowkit/xliff/XLIFF2PackageWriter.java. That’s the one referred to as beta in the Rainbow UI.

    (2) I haven’t looked at the xliff2 write in rainbow. But I don’t think there would be much difference between the Xliff2FilterWriter and a generic one.

    (3) I don’t think it would take much. A start would be to update the DOCUMENT type handler to work with more ISkeleton types like GenericSkeleton.

    I’m booked this next week and have US taxes to file :-( But week after next I can take a closer look. If yo don’t hear anything in two weeks give me another shout.

  5. Marco Anaya Valdovinos reporter

    Hey Jim, thanks for the explanations there.

    Just giving a shout here since it has been 2 weeks.

  6. jhargrave-straker

    Hi Marco - I got distracted by another bug but back working on this. I managed to combine this issue with another project I am working on (full general xliff 2 writer). Should have more info by next week.

  7. Marco Anaya Valdovinos reporter

    Hi Jim, thanks a lot for working on this! Checking in to see if you have any info.

  8. Log in to comment