Markdown filter does not handle nested emphasis/strong emphasis (xxx _uuu_)

All flavors of Markdown processors allow nesting of emphasis and/or strong emphasis such as:

**asterisks OR _underscores_**

But it seems the Okapi Markdown filter assumes only plain text between the opening and closing markers. As a result, if Okapi Markdown filter is applied to the above sample Markdown text, the underscores are included in the extracted text, instead of being replaced by codes.

The following part of MarkdownParser.java, which is called to process Emphasis and StrongEmphasis assume that only plain text.

    private void visitDelimitedNode(DelimitedNode node, MarkdownTokenType type) {
        assert node instanceof Node;
        addToQueue(node.getOpeningMarker().toString(), false, type, (Node) node);
        addToQueue(node.getText().toString(), true, TEXT, (Node) node);
        addToQueue(node.getClosingMarker().toString(), false, type, (Node) node);
    }

To test this, add the following test case to MarkdownFilterTest.java. This fails under the current implementation (v 0.36):

    @Test
    public void testUnderlinedTextWithinAsterisks() {
        String snippet = "**asterisks OR _underscores_**\n";

        try (MarkdownFilter filter = new MarkdownFilter()) {
            ArrayList<Event> events = FilterTestDriver.getEvents(filter, snippet, null);
            List<ITextUnit> tus = FilterTestDriver.filterTextUnits(events);
            assertTUListDoesNotContain(tus, "_");
        }
    }

Comments (7)