astoptimizer / README

AST Optimizer

astoptimizer is an optimizer for Python code working on the Abstract Syntax
Tree (AST, high-level representration). It does as much work as possible at
compile time.

The compiler is static, it is not a just-in-time (JIT) compiler, and so don't
expect better performances than psyco or PyPy for example. Optimizations
depending on the type of functions arguments cannot be done for examples. Only
optimizations on immutable types (constants) are done.


Source code hosted at:


 * Call builtin functions if arguments are constants. Examples:

   - ``len("abc")`` => ``3``
   - ``ord("A")`` => ``65``

 * Call methods of builtin types if the object and arguments are constants.

   - ``u"h\\xe9ho".encode("utf-8")`` => ``b"h\\xc3\\xa9ho"``
   - ``"python2.7".startswith("python")`` => ``True``
   - ``(32).bit_length()`` => ``6``
   - ``float.fromhex("0x1.8p+0")`` => ``1.5``

 * Call functions of math and string modules for functions without
   border effect. Examples:

   - ``math.log(32) / math.log(2)`` => ``5.0``
   - ``string.atoi("5")`` => ``5``

 * Format strings for str%args and print(arg1, arg2, ...) if arguments
   are constants and the format string is valid.

   - ``"x=%s" % 5`` => ``"x=5"``
   - ``print(1.5)`` => ``print("1.5")``

 * Simplify expressions. Examples:

   - ``not(x in y)`` => ``x not in y``
   - ``4 and 5 and x and 6`` => ``x and 6``

 * Loop: replace ``range()`` with ``xrange()`` on Python 2, and list with
   tuple.  Examples:

   - ``for x in range(n): ...`` => ``for x in xrange(n): ...``
   - ``for x in [1, 2, 3]: ...`` => ``for x in (1, 2, 3): ...``

 * Evaluate unary and binary operators, subscript and comparaison if all
   arguments are constants. Examples:

   - ``1 + 2 * 3`` => ``7``
   - ``not True`` => ``False``
   - ``"abc" * 3`` => ``"abcabcabc"``
   - ``abcdef[:3]`` => ``abc``
   - ``(2, 7, 3)[1]`` => ``7``
   - ``frozenset("ab") | frozenset("bc")`` => ``frozenset("abc")``
   - ``None is None`` => ``True``
   - ``"2" in "python2.7"`` => ``True``
   - ``"def f(): return 2 if 4 < 5 else 3"`` => ``"def f(): return 2"``

 * Remove dead code. Examples:

   - ``def f(): return 1; return 2`` => ``def f(): return 1``
   - ``if DEBUG: print("debug")`` => ``pass`` with DEBUG declared as False
   - ``while 0: ...`` => ``pass``

Use astoptimizer in your project

To enable astoptimizer globally on your project, add the following lines at the
very begining of your application::

    import astoptimizer
    code = "print(1+1)"
    config = astoptimizer.Config('builtin_funcs', 'pythonbin')
    # customize the config here

Right now, it only works correctly on Python 3.3.


Example with the high-level function ``optimize_code``::

    from astoptimizer import optimize_code
    code = "print(1+1)"
    code = optimize_code(code)

Example the low-level functions ``optimize_ast``::

    from astoptimizer import Config, parse_ast, optimize_ast, compile_ast
    config = Config('builtin_funcs', 'pythonbin')
    code = "print(1+1)"
    tree = parse_ast(code)
    tree = optimize_ast(tree, config)
    code = compile_ast(tree)

See also ```` script.


Unsafe optimizations are disabled by default. Use the Config() class to enable
more optimizations.

Features enabled by default:

 * ``"builtin_types"``: methods of bytes, str, unicode, tuple, frozenset, int
   and float types
 * ``"math"``, ``"string"``: constants and functions without border effects of
   the math / string module

Optional features:

 * ``"builtin_funcs"``: builtin functions like abs(), str(), len(), etc. Examples:

   - ``len("abc")`` => ``3``
   - ``ord("A")`` => ``65``
   - ``str(123)`` => ``"123"``

 * ``"pythonbin"``: Enable this feature if the optimized code will be executed by
   the same Python binary: so exactly the same Python version with the same
   build options. Allow to optimize non-BMP unicode strings on Python < 3.3.
   Enable the ``"platform"`` feature. Examples:

   - ``u"\\U0010ffff"[0]`` => ``u"\\udbff`` or ``u"\\U0010ffff"`` (depending on
     build options, narrow or wide Unicode)
   - ``sys.version_info.major`` => ``2``
   - ``sys.maxunicode`` => ``0x10ffff``

 * ``"pythonenv"``: Enable this feature if you control the environment
   variables (like ``PYTHONOPTIMIZE``) and Python command line options (like
   ``-Qnew``).  On Python 2, allow to optimize int/int. Enable ``"platform"``
   and ``"pythonbin"`` features. Examples:

   - ``__debug__`` => ``True``
   - ``sys.flags.optimize`` => ``0``

 * ``"platform"``: optimizations specific to a platform. Examples:

   - ``sys.platform`` => ``"linux2"``
   - ``sys.byteorder`` => ``"little"``
   - ``sys.maxint`` => ``2147483647``
   - ``os.linesep`` => ``"\\n"``

 * ``"cpython_tests"``: disable some optimizations to workaround issues with
   the CPython test suite. Only use it for tests.

Use ``Config("builtin_funcs", "pythonbin")`` to enable most optimizations.  You
may also enable ``"pythonenv"`` to enable all optimizations, but then the
optimized code will depends on environment variables and Python command line


Advices to help the AST optimizer:

 * Declare your constants using config.add_constant()
 * Declare your pure functions (functions with no border effect) using
 * Don't use "from module import \*". If "import \*" is used, builtins
   functions are not optimized anymore for example.


 * Operations on mutable values are not optimized, ex: len([1, 2, 3]).
 * Unsafe optimizations are disabled by default. For example, len("\\U0010ffff") is not
   optimized because the result depends on the build options of Python. Enable
   "builtin_funcs" and "pythonenv" features to enable more optimizations.
 * len() is not optimized if the result is bigger than 2^31-1.
   Enable "pythonbin" configuration feature to optimize the call for bigger
 * On Python 2, operators taking a bytes string and a unicode string are not
   optimized if the bytes string has to be decoded from the default encoding or
   if the unicode string has to be encoded to the default encoding. Exception:
   pure ASCII strings are optimized. For example, b"abc" + u"def" is replaced
   with u"abcdef", whereas u"x=%s" % b"\\xe9" is not optimized.
 * On Python 3, comparaison between bytes and Unicode strings are not optimized
   because the comparaison may emit a warning or raise a BytesWarning
   exception. Bytes string are not converted to Unicode string. For example,
   b"abc" < "abc" and str(b"abc") are not optimized. Converting a bytes string
   to Unicode is never optimized.


Version 0.3.1 (2012-09-12)

 * Disable optimizations on functions and constants if a variable with the same
   name is set. Example: "len=ord; print(len('A'))",
   "sys.version = 'abc'; print(sys.version)".
 * Don't optimize print() function, frozenset() nor range() functions if
   "builtin_funcs" feature is disabled
 * Don't remove code if it contains global or nonlocal.
   Example: "def f(): if 0: global x; x = 2".

Version 0.3 (2012-09-11)

Major changes:

 * Add astoptimizer.patch_compile(config=None) function to simply hook the
   builtin compile() function.
 * Add "pythonbin" configuration feature.
 * Disable optimizations on builtin functions by default. Add "builtin_funcs"
   feature to the configuration to optimize builtin functions.
 * Remove dead code (optionnal optimization)
 * It is now posible to define a callback for warnings of the optimizer
 * Drop support of Python 2.5, it is unable to compile an AST tree to bytecode.
   AST objects of Python 2.5 don't accept arguments in constructors.


 * Handle "from math import \*" correctly
 * Don't optimize operations if arguments are bytes and unicode strings.
   Only optimize if string arguments have the same type.
 * Disable optimizations on non-BMP unicode strings by default. Optimizations
   enabled with "pythonbin" feature.

Other changes:

 * More functions, methods and constants:

   - bytes, str, unicode: add more methods.
   - math module: add most remaining functions
   - string module: add some functions and all constants

 * not(a in b) => a not in b, not(a is b) => a is not b
 * a if bool else b
 * for x in range(n) => for x in xrange(n) (only on Python 2)
 * Enable more optimizations if a function is not a generator
 * Add sys.flags.<attr> and sys.version_info.<attr> constants

Version 0.2 (2012-09-02)

Major changes:

 * Check input arguments before calling an operator or a function, instead of
   catching errors.
 * New helper functions optimize_code() and optimize_ast() should be used
   instead of using directly the Optimizer class.
 * Support tuple and frozenset types


 * FIX: add Config.max_size to check len(obj) result
 * FIX: disable non portable optimizations on non-BMP strings
 * Support Python 2.5-3.3
 * Refactor Optimizer: Optimizer.visit() now always visit children before
   calling the optimizer for a node, except for assignments
 * Float and complex numbers are no more restricted by the integer range of the
 * More builtin functions. Examples: divmod(int, int), float(str), min(tuple),
 * More method of builtin types. Examples: str.startswith(), str.find(),
   tuple.count(), float.is_integer().
 * math module: add math.ceil(), math.floor() and math.trunc().
 * More module constants. Examples: os.O_RDONLY, errno.EINVAL,
 * More operators: a not in b, a is b, a is not b, +a.
 * Conversion to string: str(), str % args and print(arg1, arg2, ...).
 * Support import aliases. Examples: "import math as M; print(M.floor(1.5))"
   and "from math import floor as F; print(F(1.5))".
 * Experimental support of variables (disabled by default).

Version 0.1 (2012-08-12)

 * First public version (to reserve the name on PyPI!)

See also

 * AST optimizer of PyPy:
 * "A constant folding optimization pass for the AST"
 * "AST Optimization: inlining of function calls"
 * "Build-out an AST optimizer, moving some functionality out of the peephole optimizer"
 * "optimize out local variables at end of function":
 * "Optimisation of if with constant expression":
 * Cython: