Markdown lexer will sometimes not yield closing fence

Issue #1389 open
Fredrik Larsen
created an issue

When using github-style (GSM) code fences in markdown, the lexer does not yield the closing code fence when using a language hint that does not have a matching lexer.

Cause

This seems to be caused by a premature return-statement in pygments.lexers.markup on line 539 (MarkdownLexer._handle_codeblock)

Examples

Fails

GSM code fences where the language has no matching lexer:

```non-existing-language
The closing code fence (backticks) are not returned by 'get_tokens_unprocessed'
```

Works

GSM code fences where the language hint has a matching lexer:

```bash
echo "properly lexed bash code"
```

Regular markdown code fences, i.e. without a langauge hint:

```
* No lexer hint in the opening fence, so this is not handled by 'Markdown._handle_codeblock'.
* Everything wrapped in the code fences is just emitted as a single 'pygments.token.Text' item.
```

Suggested fix

The function should emit the tuple (match.start(5), String, match.group(5)) before returning, regardless of there being a lexer to process the code block content.

--- a/pygments/lexers/markup.py Mon Mar 13 19:16:03 2017 +0000
+++ b/pygments/lexers/markup.py Tue Oct 31 16:36:06 2017 +0100
@@ -536,10 +536,9 @@
         # no lexer for this language. handle it like it was a code block
         if lexer is None:
             yield match.start(4), String, code
-            return
-
-        for item in do_insertions([], lexer.get_tokens_unprocessed(code)):
-            yield item
+        else:
+            for item in do_insertions([], lexer.get_tokens_unprocessed(code)):
+                yield item

         yield match.start(5), String        , match.group(5)

There should probably also be tests that checks if the combined result of the tokens includes all text given to get_tokens_unprocessed.