YAML filter: multi-line '|' values should be produced as a single segment

Issue #643 resolved
Chase Tingley created an issue

Simple example attached.

The | operator (optionally with chomping terminator, |-) allows you to have multiple lines of text in a single value:

long_line: |-
    This is a
    very long line.

Currently, the behavior of the filter is to produce these as multiple trans-units within a single group:

<group id="sg1">
<trans-unit id="tu1" resname="long_line" xml:space="preserve">
<source xml:lang="en">This is a </source>
<target xml:lang="fr">This is a </target>
</trans-unit>
<trans-unit id="tu2" resname="long_line" xml:space="preserve">
<source xml:lang="en">very long line.</source>
<target xml:lang="fr">very long line.</target>
</trans-unit>
</group>

The problem with this is twofold: we split the value into multiple pieces for translation, and we have multiple TUs with a single resname.

Since these lines reflect a single value, better behavior would be to extract them as a single trans-unit.

Comments (9)

  1. Chase Tingley reporter

    @jhargrave, can I get your thoughts on this? Does this seem like a reasonable request, or is there some reason it needs to be the way it is?

  2. Chase Tingley reporter

    It looks like the current behavior is somewhat intentional, based on this code at YamlFilter.java:365:

    // literal scalar we assume each line is its own text unit
    if (scalar.type == YamlScalarTypes.LITERAL) {
        // ...
    } else {
        // all other scalar types keep the newlines as inline codes
        // ...
    }
    

    So is the right thing do here just to add an option?

  3. Jim Hargrave

    Now that I've looked over a few more examples, I don't think we should make an exception for the literal type. In hindsight, I think we should pull in the literal as-is (preserve all whitespace) then let the translators adjust whitespace as needed. Non-literals would be "unwrapped", like HTML - and we don't preserve newlines as inline tags.

    The original assumption was that the user would want to preserve the new lines for the translation - but this may not be true.

  4. Chase Tingley reporter

    Do you think it's ok to just change the behavior, or do we need to leave toggle for compatibility?

  5. Jim Hargrave

    I would prefer to change the behavior. Though this will break merge it's not any different from other filter changes we have made. We make no guarantees about merge compatibility for each Okapi release (one reason I'd like to slow the release cycle). It's worth a note in changes.txt that this will break merge.

  6. Log in to comment