Okapi XLIFF filter for OmegaT registers/expects segment IDs that break ID-bound matches
Background
Alternative translations are used in OmegaT to create in-context exact (ICE) matches and to disable auto-propagation in the segment the alternative translation is created. That context can be the previous and the next segments (which is often insufficient) or some metadata, e.g. the trans-unit node’s ID attribute (together with the filename, optionally) in both the default XLIFF filter and the Okapi filter.
The master TM from that project including those ID-bound translations can then later be used to pre-translate identical segments in other projects, thus populating segments where an in-context exact match happens: both the source text and the context (ID+filename) match.
Preconditions
-
An OmegaT project containing an XLIFF file for translation using the Okapi XLIFF filter (project
omtprj_01_etc.omt
, attached)- the
<target>
nodes in the XLIFF files should be empty
- the
-
An OmegaT project containing the same XLIFF file for translation using the default XLIFF filter (project
omtprj_03_etc.omt
, attached)- the
<target>
nodes in the XLIFF files should be populated with the source text (which the filter extracts as the source text) - the XLIFF filter settings in this project include ID and filename as context
- the
Steps to reproduce
- Translate the XLIFF file in project 01, creating some alternative translations (done in project
omtprj_02_etc.omt
, attached) - Generate the target files and the master TMs from that project (with shortcut Ctrl+D or Project > Created translated documents).
- Put the
*-omegat.tmx
exported master TM from that project in project03
, under folder/tm/enforce
.
Expected results
The segments translated in project 01
(or 02
) are automatically populated and locked in project 03
with the ICE matches coming from the /tm/enforce/omtprj_(01|02)_etc-omegat.tmx
. Project omtprj_03_etc._expected.omt
, attached.
Under the hood, the trans-unit’s IDs in the XLIFF files are the same as the segment IDs in the TMX file added under /tm/enforce
.
Actual results
The segments translated in project 01
(or 02
) included in file /tm/enforce/omtprj_(01|02)_etc-omegat.tmx
are available as 100% matches but those segments are not automatically populated and locked in project 03
. Project omtprj_03_etc._actual.omt
, attached.
Under the hood, when registering translations to the working TM of the project (i.e. file /omegat/project_save.tmx
), the Okapi XLIFF filter appends _0
to the segment ID, so context match is not possible and ICE matches don’t work.
Suggestion
Yves Savourel says (in mail thread): one possible simplification could be to add the segment ID [i.e. append the _0
to the trans-unit’s ID] only if the trans-unit has more than one segment. So you would get tu123
for single-segment trans-unit and tu124_0
, tu124_1
for the trans-units with two segments.
Another possibility would be to make it optional (by means of an option that the user activates or deactivates, depending on needs). Some users and some organizations always apply segmentation before producing the XLIFF file and therefore every <trans-unit>
contains always one segment only. In that setup, it doesn’t make sense to change the ID.
Additional info
One can unpack the project packages attached to this ticket directly from OmegaT from Project > Unpack OMT if you install this plugin https://github.com/briacp/plugin-omt-package/releases/tag/v1.6.3
Comments (2)
-
reporter -
reporter - edited description
- Log in to comment
The same issue would happen if the opposite direction (i.e. a TM created using the default XLIFF filter to be used in a project using the Okapi XLIFF filter).