Source

pypy / pypy / doc / project-ideas.rst

Maciej Fijalkows… 9ad0aa5 












Maciej Fijalkows… 382d205 




Maciej Fijalkows… 23e4b5d 
Maciej Fijalkows… acfe609 





Antonio Cuni 82e0779 







Maciej Fijalkows… 9ad0aa5 


Armin Rigo 9e1ad87 


Maciej Fijalkows… 9ad0aa5 
Armin Rigo 9e1ad87 
Maciej Fijalkows… 9ad0aa5 
Antonio Cuni 6ef744e 

Maciej Fijalkows… 9ad0aa5 
Maciej Fijalkows… ae33c95 

Antonio Cuni 6ef744e 








Alex Gaynor 1f10820 

Antonio Cuni 6ef744e 






Maciej Fijalkows… 9ad0aa5 
Carl Friedrich B… 1bd0227 











Amaury Forgeot d… 7dcc322 






Maciej Fijalkows… 9ad0aa5 


Maciej Fijalkows… 23e4b5d 









Maciej Fijalkows… 9ad0aa5 
Armin Rigo 18611e0 

Maciej Fijalkows… 9ad0aa5 
Armin Rigo 18611e0 








Armin Rigo 6f82f37 


Maciej Fijalkows… 23e4b5d 
Armin Rigo 18611e0 



Armin Rigo 6f82f37 

Armin Rigo 18611e0 






Armin Rigo 1f78bbb 
Maciej Fijalkows… 7fac887 










Maciej Fijalkows… 23e4b5d 





Maciej Fijalkows… 9ad0aa5 
Armin Rigo 882f8fd 






Armin Rigo 38129ef 










wlav ec9d1ae 















Armin Rigo 38129ef 
Maciej Fijalkows… 72ec6b7 

Alex Gaynor b8398a4 
Potential project list
======================

This is a list of projects that are interesting for potential contributors
who are seriously interested in the PyPy project. They mostly share common
patterns - they're mid-to-large in size, they're usually well defined as
a standalone projects and they're not being actively worked on. For small
projects that you might want to work on, it's much better to either look
at the `issue tracker`_, pop up on #pypy on irc.freenode.net or write to the
`mailing list`_. This is simply for the reason that small possible projects
tend to change very rapidly.

This list is mostly for having on overview on potential projects. This list is
by definition not exhaustive and we're pleased if people come up with their
own improvement ideas. In any case, if you feel like working on some of those
projects, or anything else in PyPy, pop up on IRC or write to us on the
`mailing list`_.

Make bytearray type fast
------------------------

PyPy's bytearray type is very inefficient. It would be an interesting
task to look into possible optimizations on this.

Implement copy-on-write list slicing
------------------------------------

The idea is to have a special implementation of list objects which is used
when doing ``myslice = mylist[a:b]``: the new list is not constructed
immediately, but only when (and if) ``myslice`` or ``mylist`` are mutated.


Numpy improvements
------------------

The numpy is rapidly progressing in pypy, so feel free to come to IRC and
ask for proposed topic. A not necesarilly up-to-date `list of topics`_
is also available.

.. _`list of topics`: https://bitbucket.org/pypy/extradoc/src/extradoc/planning/micronumpy.txt

Improving the jitviewer
------------------------

Analyzing performance of applications is always tricky. We have various
tools, for example a `jitviewer`_ that help us analyze performance.

The jitviewer shows the code generated by the PyPy JIT in a hierarchical way,
as shown by the screenshot below:

  - at the bottom level, it shows the Python source code of the compiled loops

  - for each source code line, it shows the corresponding Python bytecode

  - for each opcode, it shows the corresponding jit operations, which are the
    ones actually sent to the backend for compiling (such as ``i15 = i10 <
    2000`` in the example)

.. image:: image/jitviewer.png

The jitviewer is a web application based on flask and jinja2 (and jQuery on
the client): if you have great web developing skills and want to help PyPy,
this is an ideal task to get started, because it does not require any deep
knowledge of the internals.

Optimized Unicode Representation
--------------------------------

CPython 3.3 will use an `optimized unicode representation`_ which switches between
different ways to represent a unicode string, depending on whether the string
fits into ASCII, has only two-byte characters or needs four-byte characters.

The actual details would be rather differen in PyPy, but we would like to have
the same optimization implemented.

.. _`optimized unicode representation`: http://www.python.org/dev/peps/pep-0393/

Translation Toolchain
---------------------

* Incremental or distributed translation.

* Allow separate compilation of extension modules.

Various GCs
-----------

PyPy has pluggable garbage collection policy. This means that various garbage
collectors can be written for specialized purposes, or even various
experiments can be done for the general purpose. Examples

* An incremental garbage collector that has specified maximal pause times,
  crucial for games

* A garbage collector that compact memory better for mobile devices

* A concurrent garbage collector (a lot of work)

STM (Software Transactional Memory)
-----------------------------------

This is work in progress.  Besides the main development path, whose goal is
to make a (relatively fast) version of pypy which includes STM, there are
independent topics that can already be experimented with on the existing,
JIT-less pypy-stm version:
  
* What kind of conflicts do we get in real use cases?  And, sometimes,
  which data structures would be more appropriate?  For example, a dict
  implemented as a hash table will suffer "stm collisions" in all threads
  whenever one thread writes anything to it; but there could be other
  implementations.  Maybe alternate strategies can be implemented at the
  level of the Python interpreter (see list/dict strategies,
  ``pypy/objspace/std/{list,dict}object.py``).

* More generally, there is the idea that we would need some kind of
  "debugger"-like tool to "debug" things that are not bugs, but stm
  conflicts.  How would this tool look like to the end Python
  programmers?  Like a profiler?  Or like a debugger with breakpoints
  on aborted transactions?  It would probably be all app-level, with
  a few hooks e.g. for transaction conflicts.

* Find good ways to have libraries using internally threads and atomics,
  but not exposing threads to the user.  Right now there is a rough draft
  in ``lib_pypy/transaction.py``, but much better is possible.  For example
  we could probably have an iterator-like concept that allows each loop
  iteration to run in parallel.


Introduce new benchmarks
------------------------

We're usually happy to introduce new benchmarks. Please consult us
before, but in general something that's real-world python code
and is not already represented is welcome. We need at least a standalone
script that can run without parameters. Example ideas (benchmarks need
to be got from them!):

* `hg`

Experiment (again) with LLVM backend for RPython compilation
------------------------------------------------------------

We already tried working with LLVM and at the time, LLVM was not mature enough
for our needs. It's possible that this has changed, reviving the LLVM backend
(or writing new from scratch) for static compilation would be a good project.

(On the other hand, just generating C code and using clang might be enough.
The issue with that is the so-called "asmgcc GC root finder", which has tons
of issues of this own.  In my opinion (arigo), it would be definitely a
better project to try to optimize the alternative, the "shadowstack" GC root
finder, which is nicely portable.  So far it gives a pypy that is around
7% slower.)

Embedding PyPy
----------------------------------------

Being able to embed PyPy, say with its own limited C API, would be
useful.  But here is the most interesting variant, straight from
EuroPython live discussion :-)  We can have a generic "libpypy.so" that
can be used as a placeholder dynamic library, and when it gets loaded,
it runs a .py module that installs (via ctypes) the interface it wants
exported.  This would give us a one-size-fits-all generic .so file to be
imported by any application that wants to load .so files :-)

Optimising cpyext (CPython C-API compatibility layer)
-----------------------------------------------------

A lot of work has gone into PyPy's implementation of CPython's C-API over
the last years to let it reach a practical level of compatibility, so that
C extensions for CPython work on PyPy without major rewrites. However,
there are still many edges and corner cases where it misbehaves, and it has
not received any substantial optimisation so far.

The objective of this project is to fix bugs in cpyext and to optimise
several performance critical parts of it, such as the reference counting
support and other heavily used C-API functions. The net result would be to
have CPython extensions run much faster on PyPy than they currently do, or
to make them work at all if they currently don't. A part of this work would
be to get cpyext into a shape where it supports running Cython generated
extensions.

.. _`issue tracker`: http://bugs.pypy.org
.. _`mailing list`: http://mail.python.org/mailman/listinfo/pypy-dev
.. _`jitviewer`: http://bitbucket.org/pypy/jitviewer
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.