XLIFF Word-Count Splitter does not preserve context-group metadata

To reproduce:

In Rainbow, add the attached file as an input document
Open the Pipeline Editor and add “XLIFF Word-Count Splitter”. Set the maximum word-count per part to “2”
Click execute

The source file contains several pieces of metadata embedded via <context-group> in nested groups. All of the TUs are within in the innermost group. However, after the split is performed, the inherited metadata is missing from the second file. The splitter preserves the nested group structure, but doesn’t preserve the context-group data.

Desired behavior: the context-group data should be retained in each split as part of the replicated group structure.

(Also, it looks like there’s a secondary bug where if the word count threshold divides evenly into the total word count of the file, an empty part is produced. In this case, there are 4 words in the source file, and splitting at 2 words produces 3 split parts, one of them containing no trans-units. This is a more unlikely edge case in the real world, so it doesn’t have to be fixed here unless it’s easy to add.)

‌

Comments (4)