CDATA will get lost when using XML Filter

Issue #671 new
Former user created an issue

I found that the CDATA tag, <![CDATA[]]> will get lost after I using the XMLFilter.

The input and output will be different if I run the following code.

XMLFilter xmlFilter = new XMLFilter();
IFilterWriter writer = xmlFilter.createFilterWriter();

while(xmlFilter.hasNext()) {
    writer.handleEvent(event);
}

I debugged for sometime and find that in the filter source code ITSFilter.java .

            case Node.CDATA_SECTION_NODE:
                if ( frag == null ) {
                    skel.append(buildCDATA(node));
                }

When process the CDATA node, the frag is never null and the CDATA tag will not add to the skeleton.

Can someone take a look and see if this is a bug?

Comments (2)

  1. Chase Tingley

    Confirmed on dev. The translatable content is exposed, but the CDATA markup itself is stripped. So empty CDATA sections like <![CDATA[]]> are stripped entirely.

    IIRC we fixed a similar issue in the XLIFF filter a couple versions ago by adding an option about what to do with the CDATA markup. In some cases people wanted to strip it, in others they wanted to preserve it. In the XLIFF filter, we "preserve" the CDATA markup by converting it to inline codes.

  2. 胡泽宇

    I see. Actually, I think this filter has a option to treat it as inline tag.

                switch ( node.getNodeType() ) {
                case Node.CDATA_SECTION_NODE:
                    if ( frag == null ) {
                        skel.append(buildCDATA(node));
                    }
                    else {
                        if ( extract() ) {
                            if (params.inlineCdata) {
                                frag.append(new Code(TagType.OPENING, Code.TYPE_CDATA, "<![CDATA["));
                            }
                            frag.append(node.getNodeValue());
    
                            if (params.inlineCdata) {
                                frag.append(new Code(TagType.CLOSING, Code.TYPE_CDATA, "]]>"));
                            }
                        }
    

    If inlineCdata is true, it will append the CDATA tags to the translatable content.

    My point is, if we are not setting this configuration, the CDATA section should be stored in the skeleton. Is it the desired behavior that the CDATA section will not be put into skeleton or there is a bug that prevent this?

    From the code snippet, I think it still tries to save the CDATA tags into the skeleton, but somehow it fails the condition check.

  3. Log in to comment