Clément Pit-Claudel created an issue

I'm improving the Pygments lexer for Coq. The language I'm adding support for defines its identifiers using Unicode categories (identifiers start with a character with class Lu, Ll, Lt, Lo, or Lm, then have zero or more characters in these or Nd, Nl, No). How do I write a Pygments lexer for this? The newer regex module has support for matching these properties (using e.g. \p{Lu}), but the Python re module doesn't have a similar feature.


  1. Georg Brandl repo owner

    Pygments has the pygments.unistring helper module for that; using it is not as pretty as regex, but using regex instead of re will be a pretty huge task to ensure compatibility.

    (It may be possible to have a lexer subclass that automatically pre-processes the \p{...} escapes in regexes before passing them on to re…)

