adding an OpenCOBOL lexer

#72 Merged at 4fb27ac
  1. Brian Tiffin

My regex-fu is fairly weak, but I've been using these changes for the OpenCOBOL FAQ for a few years now. There are bugs, (ATTRIBUTE is mysteriously red boxed, the free form source multi-line string syntax is busted, and a few others) but I live with them. It highlights the source code in well enough for a volunteer document writer. I lazed out and use two forms of lexer, fixed-form and a new duplicate for free-form source formats.

Thanks for a very nice and handy tool Georg.

Cheers, Brian Tiffin

Comments (7)

    1. Brian Tiffin author

      Thanks for noticing. I'll try and get something suitable posted up.

      But if I can bug you for a second. I have a worry. COBOL allows - in identifiers, so boundary \b doesn't cut it.

      As stated, I'll admit to a low level of regex expertise and I'm not sure if there are infinite backtracking issues with the trickery used to replicate word boundary tests.

      1. Figurative constants (r'(^|(?<=[^0-9a-z_\-]))(ALL\s+)?((ZEROES)|(HIGH-VALUE|LOW-VALUE|QUOTE|SPACE|' r'ZERO)(S)?)\s*($|(?=[^0-9a-z_\-]))', Name.Constant),

      Are there string combinations that can cause the above ?<= and ?= subexpressions to spin?

      Is there a more efficient \b test with dash included along with underscore?

      Thanks Georg and thanks to all the grand folk at team Pocoo.


      1. Georg Brandl repo owner

        I admit that I can't judge ad-hoc if the given regex is safe.

        But shouldn't something like r"\b(?!-)" work too, if the goal is to exclude things that look like keywords but are continued with dash-something?

            1. Brian Tiffin author

              I just hg pushed a version of the tree with tests/examplefiles/example.cob

              I'll admit to being green as to the ways of bitbucket pygments, and please excuse the delay.