CLexer misclassifies mulitline comment

Create issue
Issue #656 duplicate
Former user created an issue

For input in C: {{{ / PIPS include guard begin: #include <stdio.h> /

include <stdio.h>

/ PIPS include guard end: #include <stdio.h> /

}}} after performing code highlighting (Clexer & HtmlFormatter) I get wrong output: {{{ <span class="cp">/ PIPS include guard begin: #include &lt;stdio.h&gt; /</span> <span class="cp">#include &lt;stdio.h&gt;</span> <span class="cm">/ PIPS include guard end: #include &lt;stdio.h&gt; /</span> }}} First line is wrongly classified as a Comment.Preproc, while it should be Comment.Multiline.

I've extracted also tokens: {{{

Token.Comment.Preproc u' / PIPS include guard begin: #include<stdio.h> /\n #' Token.Comment.Preproc u'include<stdio.h>' Token.Comment.Preproc u'\n' Token.Text u' ' Token.Comment.Multiline u'/ PIPS include guard end: #include<stdio.h> /' Token.Text u'\n' }}} First line ends not with '\n', but '#'. What happened?

Comments (2)

  1. madika

    After further investiogation, it looks like the problem lies in the combination of two regexes in pygments/lexers/ :

    • Line 42: _ws = r'(?:\s|.*?\n|/[*].*?[*]/)+'

    This regex matches a white space or a comment.

    • Line 51: ('^' + _ws + '#', Comment.Preproc, 'macro'),

    This regex will match the first line and the leading "#" on line 2, because in a multiline matching context, the comment on line 1 is considered as whitespace preceding the "#".

  2. Log in to comment