JSON filter with HTML subfilter creates duplicated TU ids

Issue #519 resolved
YvesS created an issue

The following JSON file, processed with the default settings + okf_html set for the sub-filter produces an XLIFF file where the two trans-units have the same ID.

JSON:

{ 
    BACK_TO_CHECKOUT: 'Back to checkout',
    CHECKOUT: 'Checkout'
}

XLIFF:

<group id="sg1">
    <group id="sg1_ssf1" resname="sub-filter:BACK_TO_CHECKOUT">
        <trans-unit id="sg1_tu1" resname="BACK_TO_CHECKOUT_1">
            <source xml:lang="en-us">Back to checkout</source>
            <target xml:lang="fr-fr">Back to checkout</target>
        </trans-unit>
    </group>
    <group id="sg1_ssf2" resname="sub-filter:CHECKOUT">
        <trans-unit id="sg1_tu1" resname="CHECKOUT_1">
            <source xml:lang="en-us">Checkout</source>
            <target xml:lang="fr-fr">Checkout</target>
        </trans-unit>
    </group>
</group>

Both TU have sg1_tu1 as the id.

Comments (10)

  1. Chase Tingley

    It's interesting to compare this case to the equivalent case in okf_xmlstream and see why it works there but not there. In XMLStream, the output looks like this:

    <group id="tu1_ssf1" resname="sub-filter:text">
    <trans-unit id="tu1_tu1" resname="text_1" restype="x-paragraph">
    <source xml:lang="en">Paragraph 1.</source>
    <target xml:lang="fr">Paragraph 1.</target>
    </trans-unit>
    </group>
    <group id="tu2_ssf2" resname="sub-filter:text">
    <trans-unit id="tu2_tu1" resname="text_1" restype="x-paragraph">
    <source xml:lang="en">Paragraph 2.</source>
    <target xml:lang="fr">Paragraph 2.</target>
    </trans-unit>
    </group>
    

    There is logic in the Subfilter code to try to build hierarchical ids that are unique, but the JSON filter is passing the same base id value to two separate Subfilter instances. This is because it just passes the most recent event id, which is the START_GROUP event when it entered the object:

            String parentId = eventBuilder.findMostRecentParentId();
            if (parentId == null) {
                parentId = getDocumentId().getLastId();
            }
    
            // force creation of the parent encoder
            SubFilter sf = new SubFilter(subFilter,
                    new JSONEncoder(),
                    ++subfilterIndex, parentId, parentName);
    

    In comparison, the xmlstream/AbstractMarkupFilter eventbuilder code keeps a dummy TU on the event stack to collect content, and so this TU's ID is passed to the Subfilter. Since a new TU is created each time, the IDs remain unique.

    So one solution is to have the JSON filter emulate this behavior by creating a throw-away TU, harvesting its ID, and then calling the subfilter. This should work, but that puts a lot of burden on the caller to do this correctly in the future. (In other words, this bug could easily happen again in other filters.) A more comprehensive solution might be to move more of this logic into the Subfilter itself, and do some minor refactoring of the AbstractMarkupFilter code at the same time, so that we can streamline this process.

  2. Chase Tingley

    This bug can also occur in the YAML filter, the XML Stream filter when subfiltering CDATA, and possibly others.

  3. Chase Tingley

    Fix issue 519 - ensure subfiltered JSON TUs have unique IDs

    Solves a class of bugs affecting subfiltering with at least the
    JSON, YAML, ITS, and XML Stream (CDATA only) filters, in which multiple
    TUs could be generated with the same ID.
    
    This moves responsibility for unique ID management into the
    Subfilter class, out of the parent filters.  This will causes
    changes in the TU IDs generated with subfilters.
    

    → <<cset b01d29e4ec9f>>

  4. Chase Tingley

    Jim, do you think the case you're seeing is another instance which the fix for this one missed? Or is it something new?

    (Since this bug has been closed for a year, we might want to consider opening a new one)

  5. Log in to comment