OmegaT plugin: broken hyperlink code in target

Issue #663 new
dg333 created an issue

I use the OmegaT plugin snapshot v. 1.5 m34 with OmegaT snapshot v. 4.1.3_02, both downloaded today, 26 December 2017. The plugin is enabled to process XLIFF files. I translate XLIFF files (localization for Proz.com) and have encountered a problem. When I commit translations, certain segments trigger error messages about hyperlinks. After looking into the files, I discovered that: a) In source segments, hyperlinks are presented as follows:

<ph id="0">a href="%1"</ph>hyperlink text<ph id="0">/a</ph>

b) however, when target files are compiled, hyperlinks in target segments become:

<ph id="0">a href="%1"</ph>hyperlink text<ph id="0">a href="%1"</ph>

I. e., the opening tag is duplicated instead of the closing tag.

Steps to reproduce:

  1. Download and unzip the attached archive Okapi_XLIFF_test.zip.
  2. There, you will find a sample project Okapi_XLIFF_test and a filter setting file filters.xml.
  3. Either enable the Okapi plugin in the OmegaT file filter settings, or copy filters.xml to the settings directory.
  4. Open the sample project and compile translation.
  5. See the difference in the source and the target strings in the compiled translated file (tags around EN 15038).

Comments (3)

  1. ysavourel
    • changed version to M35

    I have not run any test yet, but my guess is that the issue comes from the ID value of the <ph> tag: We have two <ph id="0"> tags. At some point during the merge from OmegaT to XLIFF, the second tag is replaced by the first.

  2. ysavourel

    Confirming: the issue comes from the fact that the second <ph> tag uses the same ID value as the first. This is something we should probably try to fix as, surprisingly, it is not explicitly invalid in XLIFF 1.2.

    This said, the XLIFF file is invalid and has the following issues:

    • The value of the date attribute is not in a format expected by XLIFF 1.1.
    • The values of the languages "eng" and "bel" are not following the BCP-47 rules used by xml:lang, source-language and target-language (it should be "en" and "be").
    • The attributes type, stage and group in the <phase> element are not valid XLIFF attributes.
    • The attribute process-name is missing from the <phase> element.
    • The attributes version in the <source> and <target> elements are not valid XLIFF attributes.

    The Okapi filter for XLIFF 1.2 is forgiving a lot, but cannot always ensure proper round-trip on invalid 1.1 files. I would highly recommend the organization from where this file comes from, to modify their extraction tool to generate valid 1.2 or 2.x files, or, at the minimum, valid 1.1 files.

  3. dg333 reporter

    The matter is that previously I did not experience this kind of problem with the plugin, although I am pretty sure that I had translated other localization files with similar hyperlinks. So I thought, there might be a regression issue in the plugin and filed this report. OK then.

  4. Log in to comment