Source

extradoc / talk / stanford-ee380-2011 / abstract.txt

---------------------------------
Python in Python: the PyPy system
---------------------------------

PyPy is a complete Python implementation in Python, in the old
tradition of Squeak and Scheme48 --- but there is more to PyPy
than just this.

During this talk I will describe what PyPy is: a mature,
8 year old project of roughly 200K lines of code and 150K lines
of tests, implementing the full Python language.  I will show our
results: faster execution of most programs (by a factor between
1.5x and 20x), and smaller total memory usage for large programs.

I will then focus on the architecture of PyPy.  On the one hand,
we have written a straightforward interpreter for the Python
language, using a (large) subset of Python called RPython.  On
the other hand, we have a complex translation toolchain which is
able to compile interpreters from RPython to efficient C code.
(We also have experimental backends for producing JVM and .NET
code.)

There are two distinct benefits from keeping the interpreter and
the translation toolchain separate.  On the one hand, we keep our
interpreter simple, and we can easily write interpreters for
other languages.  We have a complete Prolog interpreter and have
at least played with versions for Smalltalk and JavaScript.  On
the other hand, the fact that our source interpreter does not
contain any architectural choices makes for unprecedented
flexibility.  Our toolchain "weaves" into the final executable
various aspects including the object memory layout, choice of garbage
collection (GC), of execution model (regular vs. "Stackless"),
choice of backend (C/JVM/.NET), and even the
Just-in-Time Compiler (JIT).  There are great practical benefits
to this.  For example, CPython's GC is stuck with using reference
counting, while we offer a number of choices.

I will explain how this is done, and describe in more details the
JIT Compiler.  It is a "tracing JIT", as pioneered in recent
years for Java and JavaScript.  However, it is also a "meta JIT":
it works for any language, by tracing at the level of the
language's interpreter.  In other words, from an interpreter for
any language, we produce quasi-automatically a JIT suited to this
language.

I will conclude by comparing PyPy to other projects, old and new:
Squeak, CPython, Jython and IronPython, the Jikes RVM, as well as
the various recent tracing JITs such as TraceMonkey.