Thanks again for the hard work on pygments. I might have missed something in the documentation, but if not I’d like to discuss an improvement to HTML generation.
When a style specifies properties for only certain tokens, it would be nice if the HTML renderer did not issue a <span> around the element. When highlighting some large documents it's not uncommon to have most of the text in
Operator, and not uncommon either for the style to leave these two unstyled. Given this, significant space and document-complexity savings could be achieved just by omitting the corresponding spans.
To get an idea of the potential savings I patched STANDARD_TYPES to return
token.Operator, which the style that I’m using doesn’t define any highlighting for. Here’s before:
427K demo.html 423K interpreters.html 1,8M proof-by-reflection.html 53K unicode.html
And here’s after:
263K demo.html 232K interpreters.html 1019K proof-by-reflection.html 38K unicode.html
I think the implementation wouldn't be too hard: in
HtmlFormatter.format_unencoded, we'd build a stripped copy of STANDARD_TYPES, and use that instead of
STANDARD_TYPES in the call to
self._format_lines. Of course, we’d have to add an extra option to preserve the current behavior.