Filter plugin v. 1.13-1.45 breaks backward compatibility for tags

Issue #1366 on hold
Manuel Souto Pico created an issue

Background

In my team we were using version 1.11-1.43 (for Java 8) of the Okapi filters plugin for OmegaT, then we started using 1.12-1.44. That transition went fine, no issues.

When we switched to Java 11 for OmegaT, we tried to upgrade to version 1.13-1.45 of the Okapi filters plugin (which included some bug fixes and enhancements we had sponsored). That's when the problem described in this ticket was detected.

Steps to reproduce

  1. Translate an XLIFF file in OmegaT using Okapi filter plugin version 1.11-1.43 or 1.12-1.44.
  2. Delete that plugin and install version 1.13-1.45 or 1.14-1.46
  3. Restart OmegaT and open that project again

Expected results

All segments that were translated are still translated, and tags look the same (the inline code in this translation unit is expected to be exposed like <x1/> because it’s the first tag in that segment).

The expected logic in numbering standalone tags as <x1/>, <x2/>, <x3/> is that correlative numbers indicate the order of the tags in the text block (paragraph in text/Word, div/p in HTML, cell in Excel, trans-unit in XLIFF, etc.). That is helpful information for translators.

Actual results

Segments with tags which were translated become untranslated, because tags are now exposed differently, breaking the exact match. The tag is now exposed with a very long figure, e.g. <x619392636/>.

The translation becomes "orphan", which means that it's still in the working TM of the project but does not populate the segment because there's no exact match.

The expected logic in numbering standalone tags is not there. Tag <x619392636/> give no information to translators about the order of the tags in the text block.

Example

Here comes one example of many. The source file has:

<?xml version="1.0" encoding="UTF-8"?>
<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
  <file source-language="en-US" datatype="plaintext" original="ng2.template">
    <body>
      <trans-unit id="clients.clientFeatures.enableText" datatype="html">
        <source> Please confirm you would like to turn on <x id="INTERPOLATION" equiv-text="{{ togglingFeatureShortName }}" /> for this client. </source>
      </trans-unit>
    </body>
  </file>
</xliff>

Using plugin okapiFiltersForOmegaT-1.11-1.43.0.jar (in OmegaT 5.7.1 run on Java 8), that segmented looks like this in OmegaT (tag is <x1/> because it’s the first tag in the paragraph):

Now, when we open the same project with plugin okapiFiltersForOmegaT-1.13-1.45.0.jar (in OmegaT 6.0.0 run on Java 11), it looks like this:

Note: the OmegaT and Java versions are provided for testing purposes, as the Okapi filter plugins are not compatible with any. However, the OmegaT and Java version has no impact on the issue reported. I have tested the two plugins/filters in the same version of OmegaT (5.7.1) but running on both Java 8 and Java 11 and I can confirm the problem is not in OmegaT, it's in the filter.

As you can see, the translation has become an "orphan" translation, which means that it's still in the working TM of the problem but does not populate the segment because there's no exact match.

Possible solutions

I guess the most straightforward fix would be to revert the change that made the filter expose tags with a different number and then implement it again without changing the logic that creates the tag number. It seems the bug was introduced in pull request #648.

Further info

Thomas Cordonnier added:

The conversion between native tags and OmegaT tags is the responsibility of each plugin, which explains why each plugin (or each version of a plugin!) can have a different conversion rule. For example, in case of SDLXLIFF, Okapi plugin returns the number which is in attribute "id" in native XML (<g id="123">) while StaX filter returns a sequential number reset to 0 for each segment. Then, OmegaT displays what the plugin returns to it.

A key factor to understand is the number returned to OmegaT and where it comes from. The number 619392636 in the example above comes from the plugin, not from OmegaT. It must be some kind of hash of the id or of equiv-text and maybe the plugin accidentally moved to a new version of one library with a new implementation of the hash method (which could explain why it is not visible in the code of the plugin itself). We don't know which library is used.

Jim Hargrave added:

Okapi uses two id's for Code. Code.id is numeric and meant only to be an index into the TextFragment. But Code.originalId is a string and is preferred in all cases if it is non-null. In this case it looks like the string id ("INTERPOLATION") is being converted to an integer.

Comments (4)

  1. Manuel Souto Pico reporter

    Ticket moved to the omegat-plugin tracker’s ticket #272. I guess this one can be closed if the issue is related to the plugin and not the filter itself (I’m not 100% sure about that).

  2. Log in to comment