Okapi XLIFF filter for OmegaT registers/expects segment IDs that break ID-bound matches

Issue #993 new
Manuel Souto Pico created an issue

Background

Alternative translations are used in OmegaT to create in-context exact (ICE) matches and to disable auto-propagation in the segment the alternative translation is created. That context can be the previous and the next segments (which is often insufficient) or some metadata, e.g. the trans-unit node’s ID attribute (together with the filename, optionally) in both the default XLIFF filter and the Okapi filter.

The master TM from that project including those ID-bound translations can then later be used to pre-translate identical segments in other projects, thus populating segments where an in-context exact match happens: both the source text and the context (ID+filename) match.

Preconditions

  • An OmegaT project containing an XLIFF file for translation using the Okapi XLIFF filter (project omtprj_01_etc.omt, attached)

    • the <target> nodes in the XLIFF files should be empty
  • An OmegaT project containing the same XLIFF file for translation using the default XLIFF filter (project omtprj_03_etc.omt, attached)

    • the <target> nodes in the XLIFF files should be populated with the source text (which the filter extracts as the source text)
    • the XLIFF filter settings in this project include ID and filename as context

Steps to reproduce

  1. Translate the XLIFF file in project 01, creating some alternative translations (done in project omtprj_02_etc.omt, attached)
  2. Generate the target files and the master TMs from that project (with shortcut Ctrl+D or Project > Created translated documents).
  3. Put the *-omegat.tmx exported master TM from that project in project 03, under folder /tm/enforce.

Expected results

The segments translated in project 01 (or 02) are automatically populated and locked in project 03 with the ICE matches coming from the /tm/enforce/omtprj_(01|02)_etc-omegat.tmx. Project omtprj_03_etc._expected.omt, attached.

Under the hood, the trans-unit’s IDs in the XLIFF files are the same as the segment IDs in the TMX file added under /tm/enforce.

Actual results

The segments translated in project 01 (or 02) included in file /tm/enforce/omtprj_(01|02)_etc-omegat.tmx are available as 100% matches but those segments are not automatically populated and locked in project 03. Project omtprj_03_etc._actual.omt, attached.

Under the hood, when registering translations to the working TM of the project (i.e. file /omegat/project_save.tmx), the Okapi XLIFF filter appends _0 to the segment ID, so context match is not possible and ICE matches don’t work.

Suggestion

Yves Savourel says (in mail thread): one possible simplification could be to add the segment ID [i.e. append the _0 to the trans-unit’s ID] only if the trans-unit has more than one segment. So you would get tu123 for single-segment trans-unit and tu124_0, tu124_1 for the trans-units with two segments.

Another possibility would be to make it optional (by means of an option that the user activates or deactivates, depending on needs). Some users and some organizations always apply segmentation before producing the XLIFF file and therefore every <trans-unit> contains always one segment only. In that setup, it doesn’t make sense to change the ID.

Additional info

One can unpack the project packages attached to this ticket directly from OmegaT from Project > Unpack OMT if you install this plugin https://github.com/briacp/plugin-omt-package/releases/tag/v1.6.3 

Comments (2)

  1. Manuel Souto Pico reporter

    The same issue would happen if the opposite direction (i.e. a TM created using the default XLIFF filter to be used in a project using the Okapi XLIFF filter).

  2. Log in to comment