Alternative translations do not work for repetitions within paragraphs

Issue #36 new
Manuel Souto Pico created an issue

Background

Alternative translation in OmegaT have the function of stopping auto-propagation. They rely on the segment’s context: for XLIFF files the context can be the segment’s ID (<trans-unit id=...), for other file types a virtual ID is created for each text unit (e.g. paragraphs).

Preconditions

  • OmegaT version 5.5.4.
  • Plugin okapiFiltersForOmegaT-1.9-1.41.0.jar installed
  • A translated OmegaT project containing a document with several repeated sentences in the same paragraph:

First sentence in paragraph. This is a repeated sentence. This is a repeated sentence. Last sentence in paragraph.

Steps to reproduce

  1. In one of the repetitions, create an alternative translation, and press Ctrl+S to save.
  2. Do the same in another occurrence of the same repetition.

Expected results

  1. of the first step:

    1. The first alternative translation created appears in the Multiple Translations pane, together with its context (the ID).
    2. The first alternative translation does not auto-propagate to other repetitions.
  2. of the second step:

    1. The second alternative translation appears in the Multiple Translation pane, together with its context (the ID), which is different from the ID of the first alternative translation.
    2. The second alternative translation does not auto-propagate to other repetitions.

In a nutshell, each segment in the project has its own unique ID.

Actual results

  1. of the first step:

    1. As expected.
    2. The first alternative translation propagates to the second repetition (and any others within the same segment).
  2. of the second step:

    1. The text of the alternative translation in the Multiple Translations pane changes (its ID does not change) and no more alternative translations are added to the Multiple Translations pane.
    2. The second alternative translation auto-propagates to the first repetition.

This screencast hopefully illustrates the issue: https://recordit.co/7vRsjzquGn

In a nutshell, each paragraph in the source document has its own unique ID, but not each segment in the project. Segments containing sentences from one same paragraph have the same ID.

Suggestion

Each segment in the project should have its own unique ID, e.g. tu5_0, tu5_1, tu5_2, etc.

Comments (3)

  1. Manuel Souto Pico reporter

    I am attaching an OmegaT project package that may help reproduce and understand the problem.

    Re-tested as of now with these settings:
    - OmegaT version: OmegaT-5.7.1_0_c3206253
    - Okapi plugin version: okapiFiltersForOmegaT-1.11-1.43.0.jar
    - Platform: Linux 5.16.0-6mx-amd64
    - Java: 1.8.0_312 amd64 (running the JRE that OmegaT ships with)
    - Memory: 594MiB total / 132MiB free / 5298MiB max

  2. t_cordonnier

    _0 at the end is added by class AbstractOkapiFilter. It is here because Okapi framework’s filters return an ITextUnit for which tu.getSource().getSegments() may contain multiple segments … if segmentation is done by Okapi, not by OmegaT (before sending it to OmegaT);
    Now, question for developers of Okapi framework: how do we activate segmentation at Okapi framework side? I understand that this plugin can be configured by a file using format specific to Okapi framework, can this file contain location of segmentation SRX file?

    If we can do that, then OmegaT would receive segments with _1, _2, _3 in the ID and problem is solved as long as segmentation is done by Okapi, and inactivated in OmegaT. If not, then we must use OmegaT’s segmenter and I confirm that this one has a small bug - it gives the same id to all segments of same paragraph, so gives no way to have multiple translations to identical segments from same paragraph. But in this case the bug is at OmegaT side, you cannot do anything in the plugin.

  3. Log in to comment