Markdown filter does not handle nested emphasis/strong emphasis (**xxx _uuu_**)

Create issue
Issue #684 closed
Kuro Kurosaka created an issue

All flavors of Markdown processors allow nesting of emphasis and/or strong emphasis such as:

**asterisks OR _underscores_**

But it seems the Okapi Markdown filter assumes only plain text between the opening and closing markers. As a result, if Okapi Markdown filter is applied to the above sample Markdown text, the underscores are included in the extracted text, instead of being replaced by codes.

The following part of MarkdownParser.java, which is called to process Emphasis and StrongEmphasis assume that only plain text.

    private void visitDelimitedNode(DelimitedNode node, MarkdownTokenType type) {
        assert node instanceof Node;
        addToQueue(node.getOpeningMarker().toString(), false, type, (Node) node);
        addToQueue(node.getText().toString(), true, TEXT, (Node) node);
        addToQueue(node.getClosingMarker().toString(), false, type, (Node) node);
    }

To test this, add the following test case to MarkdownFilterTest.java. This fails under the current implementation (v 0.36):

    @Test
    public void testUnderlinedTextWithinAsterisks() {
        String snippet = "**asterisks OR _underscores_**\n";

        try (MarkdownFilter filter = new MarkdownFilter()) {
            ArrayList<Event> events = FilterTestDriver.getEvents(filter, snippet, null);
            List<ITextUnit> tus = FilterTestDriver.filterTextUnits(events);
            assertTUListDoesNotContain(tus, "_");
        }
    }

Comments (7)

  1. Kuro Kurosaka reporter

    The cause of this is the parts of MarkdownParser.java that handles the emphasis, strong emphasis, strike-through, and subscription, calls a private method named visitDelimitedNode that assumes the text between the pair of asterisks (or underscore, or multiple of those) is a plain text that do not require a further processing. I think I have a fix.

  2. Log in to comment