Rainbow Kit Creation Step strips source segmentation from XLIFF
To reproduce:
- Open Rainbow, add the attached
segsource.xlf
as an input document - Create a pipeline: Raw Documents to Filter Events, Rainbow Kit Creation Step
- Execute the pipeline
Compare the source XLIFF to the output XLIFF produced in the work
directory. The source contains <seg-source>
data, but this has been stripped in the working XLIFF.
My expectation would be that if there is source segmentation, we would preserve it, unless it was overridden by an explicit segmentation step in the pipeline.
Comments (9)
-
reporter -
No, I don't think it's intentional. This was simply coded long ago when 1) were rarely XLIFF and 2) most XLIFF where not segmented.
-
I can confirm this behavior. First of all, the <seg-source> is missing on the final xliff files (when they went through the post processing step). And second, additional source file content on the <target> element is also missing aftwards.
I found this out when I wanted to translate MadCap Flare files.
This was the original source file:
<trans-unit id="1" restype="x-xml-h1" phase-name="pretrans"> <source>Datensicherung</source> <seg-source><mrk mtype="seg" mid="1">Datensicherung</mrk></seg-source> <target state="translated"> <mrk mtype="seg" mid="1" MadCap:segmentStatus="Accepted" MadCap:matchPercent="101">Backup dei dati</mrk></target></trans-unit>
This is the result after the post processing:
<trans-unit id="1" restype="x-xml-h1" phase-name="pretrans"> <source>Datensicherung</source> <target state="translated">Backup dei dati</target>
No matter what filter settings I use on Rainbow or what segmentation setting I try out, I am not able to produce a "done"-file that still includes the segmented source and the annotations/elements that once were placed on the target element on the source file.
As a result, MadCap flare rejects to import the files.
-
@tingley , @ysavourel , it seems that the behaviour is intentional as with the introduction of the
net.sf.okapi.filters.xliff.Parameters#ALWAYSUSESEGSOURCE
parameter the XLIFFFilter is not processing seg-sources by default (please refer to the related commit for more information). -
reporter - attached segsource.xlf
- attached segsource-corrected.xlf
Oh interesting. @DenisKonovalyenko is correct for this example. The problem is that my
segsource.xlf
example has a mismatch between the contents of<source>
and<seg-source>
content. If the "Always use Segmented Source" option is not set (it is disabled by default), the XLIFF filter resolves this disagreement in favor of<source>
. This can be observed in the form of warnings generated by tikal:$ tikal.sh -fc okf_xliff segsource.xlf -x ------------------------------------------------------------------------------- Okapi Tikal - Localization Toolset Version: 2.0.37-SNAPSHOT ------------------------------------------------------------------------------- Error: Cannot find filter configuration 'test1' Error: Cannot find filter with ID: test1. Cannot add configuration Extraction Source language: en-US Target language: es-ES Default input encoding: UTF-8 Filter configuration: okf_xliff Output: /home/tingley/Downloads/segsource.xlf.xlf Input: /home/tingley/Downloads/segsource.xlf Error: The <seg-source> content for the entry id='NFDBB2FA9-tu1' is different from its <source>. The un-segmented content of <source> will be used.
If I enable the option, that problem goes away, and the
<seg-source>
content appears in the extracted XLIFF.However, in the example I attached to this bug, this is only an issue because I constructed the file sloppily: there is a capitalization difference between the
<source>
and<seg-source>
content! If you correct this error, as in the attachedsegsource-corrected.xlf
file, then the<seg-source>
content extracts correctly even with the option disabled. -
reporter So @DenisKonovalyenko to answer your original question -- we need a new testcase to prove this is a real bug.
I need to go back to the file that caused me to open this issue and see if it was the result of a broken file and whether the option would have helped.
@eraser17 you can help with this too. In the example you posted above, I didn't see any difference between
source
andseg-source
content. Can you confirm this? (If you have a testcase you can attach, that would be ideal.) Also, can you confirm that among the things you tried in Rainbow, this option was one of them? -
reporter This is a bug in the
XLIFFFilter
that occurs when the code finder is enabled: -
reporter - changed status to resolved
Fix
#760- preserve source segments when applying code finder→ <<cset 0f9daa0bd4b9>>
-
reporter Merged in tingley/okapi/issue760 (pull request #288)
Fix
#760- preserve source segments when applying code finderApproved-by: YvesS yves@opentag.com
→ <<cset 2df99644e83f>>
- Log in to comment
@ysavourel Any idea if this behavior is intentional for some reason?