Markdown filter does not handle HTML block-like text with empty lines

Issue #694 resolved
Kuro Kurosaka created an issue

(This was branched from the issue 685) DirectShape.md has a <tbody> tag, <td> tags directly under <table> tag with many empty lines within the <table> and </table> tags. <body> and <td> appear twice after the xlf file is merged.

Originally it was thought the cause of this is same as the issue 685. It is partly true but there are differences. So I made a new issue.

In issue 685, the tags are parsed by FlexMark and represented as an HtmlInline node. In this case, a fragment of HTML expression (e.g. "<tr>\n <td>\n This cell has is") is represented as an HtmlBlock. Although it is categorized as HtmlBlock, it is not well formed.

It was considered to use the same fix strategy for issue 685. But that would not work. This is because the Markdown spec allows markdowns within the HTML construct as seen in http://spec.commonmark.org/0.28/#example-120 We can't just put together all HTML-like pieces of text together and send it to HTML-subfilter.

We may need to reconsider the design decision of using HTML subfilter to fully support the Markdown spec.

Comments (4)

  1. Kuro Kurosaka reporter

    This is a simpler example that demonstrates how an empty line within HTL is treated. The HTML block ends without reaching the </td> tag. The newline within the <td> pair is interpreted as a markdown hard line break and meaningful. (Note in the normal interpretation of HTML, a newline is interpreted as just a whitespace, as an ASCII space would.)

  2. Kuro Kurosaka reporter

    The fix is in the code merged on 6/30/2018. Verified extracting and remerging the two attached files does not create extra tags.

  3. Log in to comment