XML Filter: CDATA content is XML-escaped when using inlineCdata option

Issue #1024 resolved
Chase Tingley created an issue

Attached: sample file, sample config. The config includes this:

  <okp:options escapeGT="no" escapeQuotes="no" inlineCdata="yes"/>

Roundtrip through Okapi:

$ tikal.sh -fc okf_xml@test.fprm -x test.xml -codeattrs
$ tikal.sh -fc okf_xml@test.fprm -m test.xml.xlf

The CDATA markers are imported as tags, and everything looks fine. (The <br/> tag is protected via codefinder.)

However, on output, the CDATA markers are restored, but XML escaping is still applied to their contents:

<string><![CDATA[Sentence 1.&lt;br />Sentence 2.]]></string>

This is incorrect.

Comments (7)

  1. Chase Tingley reporter

    This unittest does a simple roundtrip and passes. However, the same behavior isn’t seen when doing a real merge (for example with tikal, as above).

        @Test
        public void testDontEscapeInlineCdataMarkupInsideInlineCDATAOuptut() {
            String config = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n" +
                    "<its:rules xmlns:its=\"http://www.w3.org/2005/11/its\" xmlns:okp=\"okapi-framework:xmlfilter-options\" version=\"1.0\">\n" + 
                    "  <okp:options escapeGT=\"no\" escapeQuotes=\"no\" inlineCdata=\"yes\"/>\n" + 
                    "  <its:translateRule selector=\"/*\" translate=\"no\"/>\n" + 
                    "  <its:translateRule selector=\"//string\" translate=\"yes\"/>\n" + 
                    "</its:rules>";
            String snippet = "<?xml version=\"1.0\"?>\n"
                + "<doc><string><![CDATA[ Sentence 1.<br />Sentence 2. ]]></string></doc>";
            String expect = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
                + "<doc><string><![CDATA[ Sentence 1.<br />Sentence 2. ]]></string></doc>";
            Parameters params = filter.getParameters();
            params.load(new ByteArrayInputStream(config.getBytes(StandardCharsets.UTF_8)), false);
            assertEquals(expect, FilterTestDriver.generateOutput(getEvents(snippet),
                filter.getEncoderManager(), locEN));
        }
    

  2. Chase Tingley reporter

    Looks like the problem is that the cdata ctype is serialized as x-cdata, but that isn’t correctly mapped to Code.TYPE_CDATA.

  3. Log in to comment