Reindex codes list to reflect natural order of codes in coded text

Issue #997 new
Philipp created an issue

I think I found a bug in the net.sf.okapi.filters.xliff2.XLIFF2OkpToX2Converter.copyOver method.

In this method the next code position is calculated based on the code index in the codes list while looping over each character of the coded text.
If the codes aren't indexed according to their natural order in the coded text, codes with code list indexes that do not reflect their natural order in the coded text cannot be considered if the loop counter moves ahead of their text position. This happens, for example, if it looks for the next codes with actually smaller codes list indexes but occurring after higher indexed codes in the coded text.
This problem (in my case) is caused by the net.sf.okapi.common.resource.TextFragment.changeToCode(int, int, net.sf.okapi.common.resource.TextFragment.TagType, java.lang.String, boolean) method adding each new code at the end of the list.
And more concrete, when using the YamlFilter with the InlineCodeFinder on scalars with line breaks. Line breaks are added as codes first, after that, each match of the InlineCodeFinder is added to the codes list. The latest addition always remains at the end of the list, although, in the coded text it doesn’t occur last.

I link a commit with a possible solution.

Kind regards!

Philipp

Comments (0)

  1. Log in to comment