compatibility hooks for Python PLY and/or pyparsing

Issue #81 new
Jason Sachs created an issue

So vera++ works using a lexer to scan input files for tokens. It does this very well. I've written a few rules in Python (as well as tweaks to Tcl; I don't really understand much Tcl so it is just cargo-cult programming, but I can get it to work).

One issue I'm having is that more complex rules are really hard to implement because they essentially amount to ad-hoc parsing, via state machines. Which doesn't make sense to me at all, since C/C++ has a well-defined (if complex) syntax. For example, suppose you want to flag all C code where function argument lists are declared or defined () rather than (void) (The (void) syntax is correct; () tells the compiler that there are unspecified arguments). This tends to be extremely difficult -- how do you distinguish a function declaration from a function definition from a function call, without parsing the whole damn language?

So I'm wondering if there's a way to provide some compatibility hooks for PLY or pyparsing. I don't need vera++ to include a parser, but instead I'd like to use PLY (or pyparsing, but PLY is most likely faster) to construct an object model that I could access from Python .py rules, so I can get rid of all the stupid ad-hoc state machines and handle things directly.

I think all I need is the following:

  • a hook so that I can run a Python script once at the beginning of operation (rather than at the beginning of each rule), on each of the input files. (Alternatively, a rule that is guaranteed to run first.)
  • access to a list of all the token names that vera++ provides
  • a place to put an object of my choice so I can get access to it from the rule files

From there I think I could write an adapter in Python to allow PLY to use the vera++ tokens.

Is this difficult? I'm not familiar with the vera++ internals (I don't program in C/C++ on a PC, just embedded systems), but it seems like a parser adapter layer would have immense value, and enable a whole other category of rules.

Comments (2)

  1. Jason Sachs reporter

    For now, I am neglecting the ugly beast known as the preprocessor. It seems like the only guaranteed way to do robust static analysis in the presence of preprocessor macros is to run a variant of the preprocessor that maintains comments and maintains links to the original code position, then take the output and tokenize it.

  2. Log in to comment