pypy / pypy / doc / project-ideas.rst

Potential project list

This is a list of projects that are interesting for potential contributors who are seriously interested in the PyPy project. They mostly share common patterns - they're mid-to-large in size, they're usually well defined as a standalone projects and they're not being actively worked on. For small projects that you might want to work on, it's much better to either look at the issue tracker, pop up on #pypy on or write to the mailing list. This is simply for the reason that small possible projects tend to change very rapidly.

This list is mostly for having on overview on potential projects. This list is by definition not exhaustive and we're pleased if people come up with their own improvement ideas. In any case, if you feel like working on some of those projects, or anything else in PyPy, pop up on IRC or write to us on the mailing list.

Numpy improvements

This is more of a project-container than a single project. Possible ideas:

  • experiment with auto-vectorization using SSE or implement vectorization without automatically detecting it for array operations.
  • improve numpy, for example implement memory views.
  • interface with fortran/C libraries.

Improving the jitviewer

Analyzing performance of applications is always tricky. We have various tools, for example a jitviewer that help us analyze performance.

The jitviewer shows the code generated by the PyPy JIT in a hierarchical way, as shown by the screenshot below:

  • at the bottom level, it shows the Python source code of the compiled loops
  • for each source code line, it shows the corresponding Python bytecode
  • for each opcode, it shows the corresponding jit operations, which are the ones actually sent to the backend for compiling (such as i15 = i10 < 2000 in the example)

We would like to add one level to this hierarchy, by showing the generated machine code for each jit operation. The necessary information is already in the log file produced by the JIT, so it is "only" a matter of teaching the jitviewer to display it. Ideally, the machine code should be hidden by default and viewable on request.

The jitviewer is a web application based on flask and jinja2 (and jQuery on the client): if you have great web developing skills and want to help PyPy, this is an ideal task to get started, because it does not require any deep knowledge of the internals.

Translation Toolchain

  • Incremental or distributed translation.
  • Allow separate compilation of extension modules.

Work on some of other languages

There are various languages implemented using the RPython translation toolchain. One of the most interesting is the JavaScript implementation, but there are others like scheme or prolog. An interesting project would be to improve the jittability of those or to experiment with various optimizations.

Various GCs

PyPy has pluggable garbage collection policy. This means that various garbage collectors can be written for specialized purposes, or even various experiments can be done for the general purpose. Examples

  • An incremental garbage collector that has specified maximal pause times, crucial for games
  • A garbage collector that compact memory better for mobile devices
  • A concurrent garbage collector (a lot of work)

Remove the GIL

This is a major task that requiers lots of thinking. However, few subprojects can be potentially specified, unless a better plan can be thought out:

  • A thread-aware garbage collector
  • Better RPython primitives for dealing with concurrency
  • JIT passes to remove locks on objects
  • (maybe) implement locking in Python interpreter
  • alternatively, look at Software Transactional Memory

Introduce new benchmarks

We're usually happy to introduce new benchmarks. Please consult us before, but in general something that's real-world python code and is not already represented is welcome. We need at least a standalone script that can run without parameters. Example ideas (benchmarks need to be got from them!):

  • hg
  • sympy

Experiment (again) with LLVM backend for RPython compilation

We already tried working with LLVM and at the time, LLVM was not mature enough for our needs. It's possible that this has changed, reviving the LLVM backend (or writing new from scratch) for static compilation would be a good project.