Source

pycparser /

Filename Size Date modified Message
examples
pycparser
tests
utils
54 B
238 B
26.2 KB
9.2 KB
780 B
679 B
1.8 KB

pycparser v1.08

Author: Eli Bendersky

1   Introduction

1.1   What is pycparser?

pycparser is a parser for the C language, written in pure Python. It is a module designed to be easily integrated into applications that need to parse C source code.

1.2   What is it good for?

Anything that needs C code to be parsed. The following are some uses for pycparser, taken from real user reports:

  • C code obfuscator
  • Front-end for various specialized C compilers
  • Static code checker
  • Automatic unit-test discovery
  • Adding specialized extensions to the C language

pycparser is unique in the sense that it's written in pure Python - a very high level language that's easy to experiment with and tweak. To people familiar with Lex and Yacc, pycparser's code will be simple to understand.

1.3   Which version of C does pycparser support?

At the moment, pycparser supports ANSI/ISO C89, the language described by Kernighan and Ritchie in "The C Programming language, 2nd edition" (K&R2), with only selected extensions from C99. The currently supported C99 features are:

  • Allowing a comma after the last value in an enumeration list

Additionally, since pycparser lets you use your own C preprocessor (cpp), C99 features implemented in the preprocessor (such as variadic macros or // comments) can be supported in a manner transparent to pycparser.

pycparser doesn't support any GCC extensions.

1.4   What grammar does pycparser follow?

pycparser very closely follows the ANSI C grammar provided in the end of K&R2. Listings of this grammar (often in Yacc syntax) can be easily found by a simple web search. Google for ansi c grammar to get started.

1.5   What is an AST?

AST - Abstract Syntax Tree. It is a tree representation of the syntax of source code - a convenient hierarchical data structure that's built from the code and is readily suitable for exploration and manipulation.

1.6   How is pycparser licensed?

LGPL

1.7   Contact details

Drop me an email to eliben@gmail.com for any questions regarding pycparser. For reporting problems with pycparser or submitting feature requests, the best way is to open an issue on the pycparser page at Google Code.

2   Installing

2.1   Prerequisites

  • pycparser was tested on Python 2.5, 2.6 and 3.1, on both Linux and Windows
  • pycparser uses the PLY module for the actual lexer and parser construction. Install PLY version 3.3 (earlier versions work at least since 2.5) from its website.
  • If you want to modify pycparser's code, you'll need to install PyYAML, since it's used by pycparser to store the AST configuration in a YAML file.

2.2   Installation process

Installing pycparser is very simple. Once you download it from its website and unzip the package, you just have to execute the standard python setup.py install. The setup script will then place the pycparser module into site-packages in your Python's installation library.

It's recommended to run _build_tables.py in the pycparser code directory to make sure the parsing tables of PLY are pre-generated. This can make your code run faster.

3   Using

3.1   Interaction with the C preprocessor

In order to be compilable, C code must be preprocessed by the C preprocessor - cpp. cpp handles preprocessing directives like #include and #define, removes comments, and does other minor tasks that prepare the C code for compilation.

For all but the most trivial snippets of C code, pycparser, like a C compiler, must receive preprocessed C code in order to function correctly. If you import the top-level parse_file function from the pycparser package, it will interact with cpp for you, as long as it's in your PATH, or you provide a path to it.

On the vast majority of Linux systems, cpp is installed and is in the PATH. If you're on Windows and don't have cpp somewhere, you can use the one provided in the utils directory in pycparser's distribution. This cpp executable was compiled from the LCC distribution, and is provided under LCC's license terms.

3.2   What about the standard C library headers?

C code almost always includes various header files from the standard C library, like stdio.h. While, with some effort, pycparser can be made to parse the standard headers from any C compiler, it's much simpler to use the provided "fake" standard in includes in utils/fake_libc_include. These are standard C header files that contain only the bare necessities to allow valid compilation of the files that use them. As a bonus, since they're minimal, it can significantly improve the performance of parsing C files.

See the using_cpp_libc.py example for more details.

3.3   Basic usage

Take a look at the examples directory of the distribution for a few examples of using pycparser. These should be enough to get you started.

3.4   Advanced usage

The public interface of pycparser is well documented with comments in pycparser/c_parser.py. For a detailed overview of the various AST nodes created by the parser, see pycparser/_c_ast.yaml.

In any case, you can always drop me an email for help.

4   Modifying

There are a few points to keep in mind when modifying pycparser:

  • The code for pycparser's AST nodes is automatically generated from a YAML configuration file - _c_ast.yaml, by _ast_gen.py. If you modify the AST configuration, make sure to re-generate the code.
  • Make sure you understand the optimized mode of pycparser - for that you must read the docstring in the constructor of the CParser class. For development you should create the parser without optimizations, so that it will regenerate the Yacc and Lex tables when you change the grammar.
  • The script _build_tables.py can be helpful - it regenerates all the tables needed by pycparser, and the AST code from YAML.

5   Package contents

Once you unzip the pycparser package, you'll see the following files and directories:

README.txt/html:
This README file.
setup.py:
Installation script
examples/:
A directory with some examples of using pycparser
pycparser/:
The pycparser module source code.
tests/:
Unit tests.
utils/cpp.exe:
A Windows executable of the C pre-processor suitable for working with pycparser
utils/fake_libc_include:
Minimal standard C library include files that should allow to parse any C code.
utils/internal/:
Internal utilities for my own use. You probably don't need them.

6   Contributors

Some people have contributed to pycparser by opening issues on bugs they've found and/or submitting patches. The list of contributors is at this pycparser Wiki page.

7   Changelog

7.1   Version 1.08 (09.10.2010)

  • Bug fixes:
    • Correct handling of do{} ... while statements in some cases
    • Issues 6 & 7: Concatenation of string literals
    • Issue 9: Support for unnamed bitfields in structs

7.2   Version 1.07 (18.05.2010)

  • Python 3.1 compatibility: pycparser was modified to run on Python 3.1 as well as 2.6

7.3   Version 1.06 (10.04.2010)

  • Bug fixes:
    • coord not propagated to FuncCall nodes
    • lexing of the ^= token (XOREQUALS)
    • parsing failed on some abstract declarator rules
  • Linux compatibility: fixed end-of-line and cpp path issues to allow all tests and examples run on Linux

7.4   Version 1.05 (16.10.2009)

  • Fixed the parse_file auxiliary function to handle multiple arguments to cpp correctly

7.5   Version 1.04 (22.05.2009)

  • Added the fake_libc_include directory to allow parsing of C code that uses standard C library include files without dependency on a real C library.
  • Tested with Python 2.6 and PLY 3.2

7.6   Version 1.03 (31.01.2009)

  • Accept enumeration lists with a comma after the last item (C99 feature).

7.7   Version 1.02 (16.01.2009)

  • Fixed problem of parsing struct/enum/union names that were named similarly to previously defined typedef types.

7.8   Version 1.01 (09.01.2009)

  • Fixed subprocess invocation in the helper function parse_file - now it's more portable

7.9   Version 1.0 (15.11.2008)

  • Initial release
  • Support for ANSI C89