1. Georg Brandl
  2. pygments-main
  3. Issues
Issue #364 wontfix

HtmlFormatter stripping new lines

Anonymous created an issue

Either I'm using HtmlFormatter wrong or it is removing new lines.

I'm simply calling highlighter like so:

txt = highlight(txt, PythonLexer(), HtmlFormatter())

Then assigning the text to a QTextDocument object... when i do a print(txt) before and after the highlight() call, you can see the new line there before, then it dissapears after the call to highlight().

Am I just doing something wrong or is this how its supposed to work?

Thanks, Ben

Reported by guest

Comments (15)

  1. Anonymous

    Replying to [ticket:364 guest]:

    Some additional notes.. I wrote my own Lexer today to verify it was not the lexer... it doesn't appear to be the lexer thats the problem... looks like its in the HtmlFormatter()

    Here is the Lexer I wrote:

    class OS_PuppetLexer(RegexLexer): name = 'Puppet' alias = [ 'puppet' ] filenames = [ '*.pp' ]

    tokens = { 'root': [ (r'
    #.*$', Comment) ] }

  2. thatch

    This is supposed to be fixed in [a471fde4e814].

    I gave it a quick try and it appears to work now. Ben, can you provide some example text that still doesn't work post-[a471fde4e814] using the Python lexer?

    >>> from pygments import highlight
    >>> from pygments.lexers.agile import PythonLexer
    >>> from pygments.formatters import HtmlFormatter
    >>> x = "a=1\\n\\n"
    >>> highlight(x, PythonLexer(stripnl=False), HtmlFormatter())
    u'<div class="highlight"><pre><span class="n">a</span><span class="o">=</span><span class="mf">1</span>\\n\\n</pre></div>\\n'
  3. Andrew Watts

    I've applied fix [a471fde4e814], but still run into a problem. I've written the following code to test various inputs. The output is the test number and whether the test passed.

    from pygments import highlight
    from pygments.lexers import get_lexer_by_name
    from pygments.formatters import HtmlFormatter
    # This class is to give a marked up line of code for each line of code given
    # without any wrapping tags
    class MyHtmlFormatter(HtmlFormatter):
        def _wrap_div(self, inner):
            for tup in inner:
                yield tup
        def _wrap_pre(self, inner):
            for tup in inner:
                yield tup
    formatter = MyHtmlFormatter()
    lexer = get_lexer_by_name('python', stripall=False, stripnl=False)
    # The tests
    tests = [
        "print 'Hello world'",
        "print 'Hello world'\\n",
        "print 'Helloworld'\\n\\n",
        "\\nprint 'Helloworld'",
        "\\nprint 'Helloworld'\\n",
        "\\n\\nprint 'Helloworld'\\n",
        "\\nprint 'Helloworld'\\n\\n",
        "\\n\\nprint 'Helloworld'\\n\\n",
    for linenum, code in enumerate(tests):
        result = highlight(code, lexer, formatter)
        print linenum, len(result.split('\\n')) == len(code.split('\\n'))

    The output I get is:

    0 False
    1 True
    2 True
    3 False
    4 True
    5 True
    6 True
    7 True

    Something is introducing a trailing newline when one doesn't exist in the original. Removing

            if not text.endswith('\\n'):
                text += '\\n'

    from `get_tokens` doesn't seem to help. I'll do some more investigation when I get the chance.

  4. Andrew Watts

    I believe the following solves the probelm:


            if not text.endswith('\\n'):
                text += '\\n'

    from `get_tokens` in `lexer.py`.


            if line:
                yield 1, line + (lspan and '</span>') + lsep


            if line:
                yield 1, line + (lspan and '</span>')

    in `_format_lines` in `formatters/html.py`.

    Please note I've only verified this works with the test cases above.

  5. gbrandl

    There are several places in Pygments that expect that code ends in a newline. I can't remember what exactly, but it made stuff much easier at several places. I don't think this is a problem, as code shouldn't be sensitive to that.

  6. Andrew Watts

    Unfortunately I have code which is sensitive to this. I've written a diff program which compares two files and presents the differences side-by-side. The two files are displayed syntax highlighted. In this particular case introducing newline characters to the files is causing problems.

    If you can let me know why newlines are expected at the end of code then I'm happy to continue looking into this problem. Thanks.

  7. Andrew Watts

    Is there any change that the code that I supplied which actually fixes this bug could make it into the library? If the fix is wrong please let me know why so I can look at it again.

    Alternatively if you could point me in the direction of the "several places in Pygments that expect that code ends in a newline" then I'll see if that can be changed.

    I would like this done as I have code which is sensitive to leading and trailing new line characters. Thanks.

  8. Tim Hatch
    • changed milestone to Someday
    • removed assignee
    • edited description

    I suspect the issues Georg was writing about were in individual lexers -- some of them match .*\n sort of things intending to get an entire line. Do you still use Pygments in your diff tool?

  9. Log in to comment