Source / tmdonate.html

Full commit
<!DOCTYPE html>
	<title>PyPy :: Call for donations - Transactional Memory / Automatic Mutual Exclusion in PyPy</title>
	<meta http-equiv="content-language" content="en" />
	<meta http-equiv="content-type" content="text/html; charset=utf-8" />
	<meta name="author" content="PyPy Team" />
	<meta name="description" content="PyPy" />
	<meta name="copyright" content="MIT" />
	<meta name="document-rating" content="general" />
	<link rel="stylesheet" type="text/css" media="screen" title="default" href="css/site.css" />
	<link rel="alternate" type="application/rss+xml" title="RSS Feed for PyPy" href="" />
  <link rel="stylesheet" type="text/css" href="css/jquery-ui-1.8.14.custom.css" />
	<script type="text/javascript" src=""></script>
	<script type="text/javascript">try{Typekit.load();}catch(e){}</script>
	<script type="text/javascript" src=""></script>
  <script type="text/javascript" src="js/jquery-ui-1.8.14.custom.min.js"></script>
  <script type="text/javascript" src="js/detect.js"></script>
  <script type="text/javascript" src="js/script2.js?bust=1"></script>
<script type="text/javascript">
	var _gaq = [['_setAccount', 'UA-7778406-3'], ['_trackPageview']];
	if (document.location.protocol !== 'file:') {
		(function() {
			var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
			ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '';
			(document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(ga);
<div id="body-outer"><div id="body-inner"><div id="body" class="clearfix">
<div id="header">
	<div id="menu-follow">
		<div><a href="" title="Follow the conversation on Twitter"><img src="" alt="Follow the conversation on Twitter" width="14px" height="14px" /></a></div>
    <div><a href=""><img src="" width="14px" height="14px" /></a></div>
		<div><a href="" title="Subscribe to the RSS Feed"><img src="" alt="Subscribe to the RSS Feed" width="14px" height="14px" /></a></div>
	<div id="logo"><a href=""><img src="image/pypy-logo.png" alt="PyPy" height="110px" /></a></div>
	<hr class="clear-left" />
	<div id="menu-sub"><a href="index.html">Home</a><span class="menu-sub-sep"> | </span><a href="features.html">Features</a><span class="menu-sub-sep"> | </span><a href="download.html">Download</a><span class="menu-sub-sep"> | </span><a href="compat.html">Compatibility</a><span class="menu-sub-sep"> | </span><a href="performance.html">Performance</a><span class="menu-sub-sep"> | </span><a href="">Dev Documentation</a><span class="menu-sub-sep"> | </span><a href="">Blog</a><span class="menu-sub-sep"> | </span><a href="people.html">People</a><span class="menu-sub-sep"> | </span><a href="contact.html">Contact</a><span class="menu-sub-sep"> | </span><a href="py3donate.html">Py3k donations</a><span class="menu-sub-sep"> | </span><a href="numpydonate.html">NumPy donations</a><span class="menu-sub-sep"> | </span><a href="tmdonate.html">STM/AME donations</a></div>
	<hr class="clear" />
<div id="content">
<div id="main">
<h1 class="title">Call for donations - Transactional Memory / Automatic Mutual Exclusion in PyPy</h1>
<div class="section" id="introduction">
<p>In the presence of today's machines with multiple processors, Python
progress is lagging behind: on any CPU-constrained program, developers
have a difficult choice to make.  They can use in-process solutions that
do not offer multi-CPU usage.  In this respect, the natural choice
nowadays is to use Twisted or other event-based paradigms, or systems
that hide events in the control flow, like Stackless; or alternatively,
they can use the existing <tt class="docutils literal">threading</tt> module, with its associated GIL
and the complexities of real multi-threaded programming (locks,
deadlocks, races, etc.), which make this solution less attractive.  The
big alternative is for them to rely on one of various multi-process
solutions that are outside the scope of the core language; all of them
in some way or another are hacks that require extra knowledge and time
to use and that have an impact on the structure of the whole program.</p>
<p>This proposal is about researching and implementing Transactional Memory
in PyPy.  This is a technique that recently came to the front of the
multi-core scene.  It promises to offer multi-core CPU usage without
requiring to fall back to the multi-process solutions described above,
and also without using the <tt class="docutils literal">threading</tt> module &ndash; just as a small,
local extension of the programming language that would be used only in
the core of the event loops.</p>
<p>(Jump directly to <a class="reference internal" href="#what-python-interface-will-i-use">What Python interface will I use?</a> for practical
<div class="section" id="in-more-details">
<h1>In more details</h1>
<p>This is a call for financial help in researching and implementing a
version of PyPy able to use multiple processors in a single process.
This will give a GIL-less Python, i.e. a Python that runs without the
infamous Global Interpreter Lock.</p>
<p>The basic ideas to do it have been discussed on pypy-dev <a class="reference external" href="">[1]</a> <a class="reference external" href="">[2]</a>
and on a blog post <a class="reference external" href="">[3]</a>.
The goal is to adapt Transactional Memory &ndash; currently only available
as software, but <a class="reference internal" href="#soon-available-as-hardware">soon available as hardware</a> &ndash; to the task of running
sections of Python code in parallel on multiple processors, while giving
the programmer the illusion of serial execution.  It is called below
<p>The main developer will be Armin Rigo.
This is a &ldquo;researchy&rdquo; goal in the sense of us not being quite sure of
the performance of the result.  We currently estimate the one-year
performance goal at 2x-to-5x slower than regular PyPy in fully serial
applications.  We feel confident that it can work, though, in the
following sense: the performance of PyPy-TM running suited applications
should scale linearly or close-to-linearly with the number of processors.
This means that you just need a machine with at least 4 or 8 processors,
which is already common today and will be even more so in one or two
<p>You will find below a sketch of the <a class="reference internal" href="#work-plan">work plan</a>.  If more money than
requested is collected, then the excess will be entered into the general
PyPy pot, used for example to finance sprint travel costs to students.</p>
<p><strong>Note</strong> For donations higher than $1,000, we can arrange for an invoice
and a different payment method to avoid the high Paypal fees.  Please
contact pypy at if you want to know details on how
to donate via other means.</p>
<div class="section" id="what-is-the-global-interpreter-lock">
<h2>What is the Global Interpreter Lock?</h2>
<p>The GIL, or Global Interpreter Lock, is a single lock in both CPython
and (so far) PyPy, that all threads must acquire in order to execute
Python bytecodes.  This means that so far, in Python, even when using
threads we do not gain any benefit in term of multicore performance.</p>
<div class="section" id="what-is-transactional-memory">
<h2>What is Transactional Memory?</h2>
<p><a class="reference external" href="">Transactional Memory</a> &ndash; TM &ndash; is a technique imported from databases: every
time we want to do a change to the processors' main memory, we do it in
a &ldquo;transaction&rdquo;.  Multiple transactions can be executed in parallel by
multiple cores.  When a transaction is complete, we try to commit it.
This might either succeed, or (if another transaction committed
incompatible changes) fail.  If it fails, which is hopefully rare, we
need to restart the transaction from scratch.</p>
<div class="section" id="why-hasn-t-the-idea-been-implemented-for-cpython-already">
<h2>Why hasn't the idea been implemented for CPython already?</h2>
<p>Because of the additional complexity required, and mostly, because of
performance issues.  There have been some experiments already with
CPython on <em>Hardware</em> Transactional Memory:</p>
<ul class="simple">
<li><a class="reference external" href="">Riley and Zilles (2006)</a></li>
<li><a class="reference external" href="">Tabba (2010)</a></li>
<p>The motivation for using PyPy instead of CPython is the extra
flexibility of the general approach, as well as the ability to utilize
the JIT in order to remove part of the overhead that comes with
<em>Software</em> Transactional Memory.</p>
<div class="section" id="a-gil-less-python-is-impossible">
<h2>A GIL-less Python is impossible.</h2>
<p>This is a classic criticism of research-oriented projects.  We believe
that the <a class="reference internal" href="#work-plan">work plan</a> below can make a serious impact on considering
possible a GIL-less Python.  We believe we can do it, but at the
very least, even if this work generates a negative result, the negative
result will document the challenges faced should someone else want to
reattempt the idea in the future.</p>
<p>Nevertheless other projects, such as Psyco (also by Armin) and PyPy
itself, were called impossible before they were attempted, and have
hitherto been very successful.</p>
<div class="section" id="why-do-it-with-pypy-instead-of-cpython">
<h2>Why do it with PyPy instead of CPython?</h2>
<p>Because PyPy is designed to be open to this kind of research.  This will
require no work in the Python interpreter part of PyPy, and instead we
can focus on e.g. the concurrent garbage collection and the JIT issues.
<a class="reference external" href="">Riley and Zilles</a> have also experimented with Hardware Transactional
Memory using PyPy.  By contrast, for example, CPython is stuck with
reference counting, which is known to be an issue for Transactional
Memory (<a class="reference external" href="">Tabba</a> proposes to give up and use <a class="reference external" href="">Boehm</a>, which is a bad
idea in our experience, particularly because of scalability issues).</p>
<div class="section" id="what-python-interface-will-i-use">
<h2>What Python interface will I use?</h2>
<p>Previous attempts on Hardware
Transactional Memory focused on parallelizing existing programs written
using the <tt class="docutils literal">thread</tt> or <tt class="docutils literal">threading</tt> modules.  However, as argued
<a class="reference external" href="">here</a>, this may not be the most practical way to achieve real
multithreading; it seems that better alternatives would offer good
scalability too.  Notably, TM could benefit any event-based system that
is written to dispatch events serially (Twisted-based, most GUI toolkit,
Stackless, gevent, and so on).  The events would internally be processed
in parallel, while maintaining the illusion of serial execution, with
all the corresponding benefits of safety.  This should be possible with minimal
changes to the event dispatchers. This approach has been described by the
<a class="reference external" href="">Automatic Mutual Exclusion</a> work at Microsoft Research, but not been
implemented anywhere (to the best of our knowledge).</p>
<p>Note that, yes, this gives you both sides of the coin: you keep using
your non-thread-based program (without worrying about locks and their
drawbacks like deadlocks, races, and friends), <em>and</em> your programs
benefit from all your cores.</p>
<p>In more details, a low-level built-in module will provide the basics to
start transactions in parallel; but this module will be only used
internally in a tweaked version of, say, a Twisted reactor.  Using this
reactor will be enough for your existing Twisted-based programs to
actually run on multiple cores.  You &ndash; as a developer of the
Twisted-based program &ndash; have only to care about improving the
parallelizability of your program (e.g. by splitting time-consuming
transactions into several parts; the exact rules will be published in
detail once they are known).  But the point is that your program is
always correct.</p>
<p>See some <a class="reference external" href="">concrete example</a> of the API.</p>
<div class="section" id="speed">
<p>We estimate the one-year performance target to be 2x-to-5x slower than
the performance of the regular PyPy in fully serial applications.  (Of
course, the regular PyPy will not disappear; for the foreseeable future,
PyPy-TM will be an alternative executable.)</p>
<p>The performance of PyPy-TM running suited applications should scale
linearly or close-to-linearly with the number of processor.  This means
that in order to see the benefits, you just need a machine with at least
4 or 8 processors &ndash; which is already common today and will be even
more so in one or two years.</p>
<div class="section" id="hardware-transactional-memory">
<span id="soon-available-as-hardware"></span><h2>Hardware Transactional Memory</h2>
<p>The performance of PyPy-TM running on Hardware Transactional Memory is
unknown so far, but should ideally be close to the performance of a
regular PyPy.</p>
<p>In more details: This proposal is for work based entirely on <em>Software</em>
Transactional Memory.  However, in the future the ideas and most of the
code should map directly to Hardware Transactional Memory (HTM).  We
expect HTM to reduce a lot the cost of some of the issues that we face,
but not completely remove it.  For example, <a class="reference external" href="">AMD's old proposal</a> was
that there would be two ways to emit memory-referencing
instructions: one that goes via the HTM mechanisms, and one (the regular
one) which doesn't.  Choosing the best one on a
case-by-case basis in the JIT makes a difference in
performance (although of course not as great as with Software
Transactional Memory).</p>
<p><a class="reference external" href="">Intel's current proposal</a> on <a class="reference external" href="">Haswell</a> processors does not have this
distinction, which means transactions are likely to quickly overflow the
internal buffers.  As a result, the first generation HTM-capable
processors may not be suited for the approach described here.  But
this depends on details like the capacity of the hardware buffers that
are still secret at this point.</p>
<p>(Note also that HTM does not solve some of the issues for implementing
Transactional Memory in CPython, notably the issue with reference
counting.  We will have to wait for a real CPython experiment before
we can settle this question.  Also, this would &ldquo;just&rdquo; remove the GIL
but not offer a multi-core version of non-thread-based programs.)</p>
<div class="section" id="alternatives">
<p>PyPy-TM will be slower than judicious usage of existing alternatives,
based on multiple processes that communicate with each other in one way
or another.  The counter-argument is that TM is not only a cleaner
solution: there are cases in which it is not doable to organize (or
retrofit) an existing program into the particular format needed for the
alternatives.  In particular, small quickly-written programs don't need
the additional baggage of cross-process communication, and large
programs can sometimes be almost impossible to turn into multi-process
versions.  By contrast, we believe that TM can fit naturally into most
programs, because it only requires local changes to some dispatcher; the
rest of the program should work without changes.</p>
<div class="section" id="more-readings">
<h2>More readings</h2>
<ul class="simple">
<li><a class="reference external" href="">Original blog post</a></li>
<li>pypy-dev mails <a class="reference external" href="">[1]</a> <a class="reference external" href="">[2]</a></li>
<li><a class="reference external" href="">The most recent blog post</a></li>
<div class="section" id="work-plan">
<h1>Work plan</h1>
<p>This is an very rough estimate of the amount of work it would take to
complete the steps for an experienced developer who is already familiar
with the PyPy codebase.  As this is a research proposal, we cannot
guarantee the time estimates here, but we do agree to report regularly to
the community, so our progress can be followed publicly.</p>
<p>Paid work will be at $60/hour, but at least one developer who will work on
the project &ndash; Armin Rigo &ndash; has committed to 2 hours
of volunteer work per paid hour (so
the total amount of money that we ask is divided by three).  A 5% general
donation will go to the <a class="reference external" href="">Software Freedom Conservancy</a> itself, the
non-profit organization of which the PyPy project is a member and which
manages all the issues related to donations, payments, and tax-exempt
<li><p class="first"><strong>STM Library</strong>:</p>
<p>This part covers adapting an existing STM library for PyPy.  It is
already mostly done (based on <a class="reference external" href="">rstm</a>), but additional tweaks may be
<li><p class="first"><strong>Basic tweaks of the translation process</strong>:</p>
<p>This part covers tweaks needed during the translation process in
order to generate an STM-aware version of the RPython programs,
including PyPy itself.  It is partly done, but not finished.
Estimate: 1 month.</p>
<li><p class="first"><strong>Garbage collection</strong>:</p>
<p>We need a different garbage collector that is able to cope at least
with concurrent allocations.  From there, improving the situation is
its own open-ended subproject: we can add for example various kinds of
parallel collection, synchronized or unsynchronized root tracing,
etc.  Estimate for the basic part: 2 months.  Estimate for the rest:
4 extra months.</p>
<li><p class="first"><strong>User interface</strong>:</p>
<p>This is the problem of designing and implementing some interface or
interfaces for the Python programmer.  We put it in its own category
because of its &ldquo;end-user&rdquo; importance.  Estimate: 2 months.</p>
<li><p class="first"><strong>JIT integration</strong>:</p>
<p>The above would give us a (probably very slow) version of PyPy-TM.
This final part is to integrate it with the JIT compiler generator.
The main issue we foresee is integration with the new GC, detecting
with object flags or JIT optimizations which objects need
transactional memory status or not.  We think that with enough such
optimizations we can seriously lower the overhead of PyPy-TM, maybe
down to 2x slower than a regular PyPy or better.  Estimate: unknown;
at least 4 months.</p>
<li><p class="first"><strong>Longer term</strong>:</p>
<p>In the longer term, we might need to refine the TM processing done
above, for example to better support I/O (e.g. we can queue the writes
done to a log file) or to add some special fine-grained support
(e.g. two transactions that each do <tt class="docutils literal">samelist.append()</tt> do not need
to conflict in the simple case).  This part is not included
in the estimates.</p>
<p>Note: by default, any I/O can be done, but turns the transaction
&ldquo;inevitable&rdquo;.  An inevitable transaction must not abort, so it must be
the next one to commit.  This introduces delays at the end of the other
CPUs' transactions.</p>
<p>Total: 5 months for the initial version; at least 8 additional months
for the fast version.  We will go with a total estimate of 15 months,
corresponding to USD$151200.  The amount sought by this fundraising
campaign,  considering the 2 volunteer hours per paid hour is thus USD$50400.</p>
<div class="section" id="benefits-of-this-work-to-the-python-community-and-the-general-public">
<h1>Benefits of This Work to the Python Community and the General Public</h1>
<p>Python has become one of the most popular dynamic programming languages in
the world.  Web developers, educators, and scientific programmers alike
all value Python because Python code is often more readable and because
Python often increases programmer productivity.</p>
<p>Traditionally, languages like Python ran more slowly than static, compiled
languages; Python developers chose to sacrifice execution speed for ease
of programming.  The PyPy project created a substantially improved Python
language implementation, including a fast Just-in-time (JIT) compiler.
The increased execution speed that PyPy provides has attracted many users,
who now find their Python code runs up to four times faster under PyPy
than under the reference implementation written in C.</p>
<p>However, in the presence of today's machines with multiple processors,
Python progress lags behind.  The issue has been described in the
introduction: developers that really need to use multiple CPUs are
constrained to select and use one of the multi-process solutions that
are all in some way or another hacks requiring extra knowledge and
efforts to use.  The focus of the work described in this proposal is to
offer an alternative in the core of the Python language &ndash; an
alternative that can naturally integrate with the rest of the program.
This alternative will be implemented in PyPy.</p>
<p>PyPy's developers make all PyPy software available to the public without
charge, under PyPy's Open Source copyright license, the permissive MIT
License.  PyPy's license assures that PyPy is equally available to
everyone freely on terms that allow both non-commercial and commercial
activity.  This license allows for academics, for-profit software
developers, volunteers and enthusiasts alike to collaborate together to
make a better Python implementation for everyone.</p>
<p>PyPy-TM will be available under the same license.  Being licensed freely
to the general public means that opportunities to use, improve and learn
about how Transactional Memory works itself will be generally available
to everyone.</p>
<div id="sidebar">