SDLXLIFF: <seg-source> with no <mrk> still generates a segment

Issue #466 resolved
Chase Tingley created an issue

SDLXLIFF uses <seg-source> to keep track of sources. Normally, <mrk mytpe="seg"> is used to identify segments; markup (such as <x> codes) that don't fall within a <mrk> pair are correctly exposed as non-segment TextParts by the filter.

However, in some cases Studio may produce a trans-unit that consists only of codes and contains no actual segments. I've attached a reduced testcase that shows this -- this file contains one trans-unit, but will not display any segments in Studio, because there are no <mrk>'d sections of the <seg-source>. Our XLIFF filter will report one gigantic, garbage TextUnit containing several hundred codes.

Basically, if there are no <mrk> sections in the <seg-source>, the correct behavior for SDLXLIFF is to not return any segments, only non-segment TextParts. What I'm worried about is whether this assumption is safe to make for other tools that may be using seg-source.

Comments (6)

  1. Chase Tingley reporter

    Takeaway from the discussion:

    • This needs to be an optional behavior. The option should be enabled for okf_xliff-sdl, disabled by default (I think) otherwise. It's hard to think of a name for the option that isn't a bunch of jargon like "skip seg-source with no marked segments".
    • To avoid changing the TextUnit implementation itself (which would be invasive), the best behavior would be for the filter simply not to produce a TextUnit event at all for these empty TUs. Unfortunately, that means the filter will need to immediately re-serialize a partially-parsed TextUnit back into skeleton when it figures out this is happening.
  2. Chase Tingley reporter

    The fix added a new skipNoMrkSegSource config option; this option is enabled by default for okf_xliff-sdl, but disabled by default for okf_xliff.

  3. Chase Tingley reporter

    I've seen at least one file where the assumptions we made in this fix seem to be wrong, so I'm wondering if either something has changed in Studio or there's an additional layer of complexity to this.

  4. Log in to comment