make lexer.words() work with lexer.bygroups()

Issue #1229 resolved
Denis Lohner
created an issue

In rare cases lexer.words(keywords) doesn't group the matched words such that using it together with lexer.bygroups is not guaranteed to work.

An example of such a case is e.g. keywords = ('A','B','C',).

A solution could be to surround make_charset(oneletter) with open_paren + ... + close_paren in pygments/regexopt.py at line 57?

Comments (3)

  1. Denis Lohner reporter

    What I mean is, that using words(('A','B','C',), suffix=r'([0-9]+)').get() results in '[ABC]([0-9]+)' instead of '([ABC])([0-9]+)' as I expected. Thus using it together with bygroups in tokens as e.g. in

    tokens = { 'root': [(words(('A','B','C',), suffix=r'([0-9]+)'), bygroups(Keyword,Comment))] }
    

    relating [ABC] to Keyword and [0-9]+ to Comment fails.

    As far as I understood regexopt, this issue only arises if all arguments to words are single letters. Thus it only occurs in rare or even artificial situations. Nevertheless, if tokens is for example automatically generated this could be a problem.

  2. Log in to comment