Omitting unnecessary <spans> in output

Issue #1522 new
Clément Pit-Claudel created an issue

Thanks again for the hard work on pygments. I might have missed something in the documentation, but if not I’d like to discuss an improvement to HTML generation.

When a style specifies properties for only certain tokens, it would be nice if the HTML renderer did not issue a <span> around the element. When highlighting some large documents it's not uncommon to have most of the text in Name or Operator, and not uncommon either for the style to leave these two unstyled. Given this, significant space and document-complexity savings could be achieved just by omitting the corresponding spans.

To get an idea of the potential savings I patched STANDARD_TYPES to return '' for token.Name and token.Operator, which the style that I’m using doesn’t define any highlighting for. Here’s before:

 427K demo.html
 423K interpreters.html
 1,8M proof-by-reflection.html
  53K unicode.html

And here’s after:

 263K demo.html
 232K interpreters.html
1019K proof-by-reflection.html
  38K unicode.html

I think the implementation wouldn't be too hard: in HtmlFormatter.format_unencoded, we'd build a stripped copy of STANDARD_TYPES, and use that instead of STANDARD_TYPES in the call to self._format_lines. Of course, we’d have to add an extra option to preserve the current behavior.

Comments (0)

  1. Log in to comment