XLIFF Filter's handling of third-party generated XLIFF files needs improvement

Issue #758 new
Kuro Kurosaka created an issue

When Okapi extracts XLIFF files that originate from other CAT tools, many tags and attributes do not show up in the XLIFF file generated by Okapi. (Some are gone forever. Some show up after the merge.) For examples, the context-group tags, the group tags, the ph tags (along with its attributes) are gone.

When the Okapi XLIF is merged, the (potentially updated) XLIFF standard attributes' values (such as the state attribute of the target tags) should appear in the merged XLIF file, but the original values are found after the merge.

The use case is a CAT tool that takes all kinds of source files, including XLIFF, manages the translation process within, then export the result back in the original format. Since the translation process is managed within the tool, the status information should be updated in the Okapi-generated XLIFF file, and the updated information should be reflected in the exported (merged) XLIFF file.

Comments (8)

  1. Kuro Kurosaka reporter

    Test_Context_and_PH.xlf has context-group elements in /xliff/file/body/group and /xliff/file/body/group/trans-unit. It also has a file/header element. It is for en-EN to pt-BR translation and targets exist. Using Rainbow, choose this file as the input file, change the target language to pt-BR, and Utilities->Translation Kit Creation, then select Generic XLIFF, and click on Options and uncheck "Use <g></g> and <x/> notation". Click Execute. The resulting file, Test_Context_and_PH.xlf.xlf, found in pack1/work directory is attached. There, entire context-group elements and header element are missing. Also missing are the equiv-text attributes of the ph tags and the state attribute of the target element.

  2. Kuro Kurosaka reporter

    This user wishes to retain tags and attributes that would be useful for translation in the XLIFF file that is generated by Okapi. Then the translation task can be carried out using Okapi-generated XLIFF file. (Note that many of these missing elements are stored in the skeleton of TextUnits or DocumentParts and eventually get restored in the merged file, which is another XLIFF file in this case.)

  3. Kuro Kurosaka reporter

    After running "Translation Post-Processing" pipeline, I noticed all of <ph id="..." equiv-text="{x}"/> have become <ph id="..." equiv-text="{x}"></ph>. Although they are semantically equivalent, it is odd.

  4. Chase Tingley

    It looks like <context-group> may not get parsed at all by the XLIFFFilter; this would be a good thing to support in both reading and writing.

  5. Log in to comment