Commits

Brian Kearns committed e6b55fa Merge

merge heads

  • Participants
  • Parent commits eefd067, 036ae5f

Comments (0)

Files changed (20)

File lib_pypy/_pypy_interact.py

 import sys
 import os
 
+irc_header = "And now for something completely different"
+
 
 def interactive_console(mainmodule=None, quiet=False):
     # set sys.{ps1,ps2} just before invoking the interactive interpreter. This
     if not quiet:
         try:
             from _pypy_irc_topic import some_topic
-            text = "And now for something completely different: ``%s''" % (
-                some_topic(),)
+            text = "%s: ``%s''" % ( irc_header, some_topic())
             while len(text) >= 80:
                 i = text[:80].rfind(' ')
                 print(text[:i])

File lib_pypy/_tkinter/tklib.py

     incdirs = []
     linklibs = ['tcl85', 'tk85']
     libdirs = []
+elif sys.platform == 'darwin':
+    incdirs = ['/System/Library/Frameworks/Tk.framework/Versions/Current/Headers/']
+    linklibs = ['tcl', 'tk']
+    libdirs = []
 else:
     incdirs=['/usr/include/tcl']
     linklibs=['tcl', 'tk']

File pypy/doc/cpython_differences.rst

   type and vice versa. For builtin types, a dictionary will be returned that
   cannot be changed (but still looks and behaves like a normal dictionary).
 
+* PyPy prints a random line from past #pypy IRC topics at startup in
+  interactive mode. In a released version, this behaviour is supressed, but
+  setting the environment variable PYPY_IRC_TOPIC will bring it back. Note that
+  downstream package providers have been known to totally disable this feature.
+
 .. include:: _ref.txt

File pypy/doc/release-2.3.0.rst

 =======================================
-PyPy 2.3 - Easier Than Ever
+PyPy 2.3 - Terrestrial Arthropod Trap
 =======================================
 
 We're pleased to announce PyPy 2.3, which targets version 2.7.6 of the Python

File pypy/doc/stm.rst

-======================
-Transactional Memory
-======================
+
+=============================
+Software Transactional Memory
+=============================
 
 .. contents::
 
 
 This page is about ``pypy-stm``, a special in-development version of
 PyPy which can run multiple independent CPU-hungry threads in the same
-process in parallel.  It is side-stepping what is known in the Python
-world as the "global interpreter lock (GIL)" problem.
+process in parallel.  It is a solution to what is known in the Python
+world as the "global interpreter lock (GIL)" problem --- it is an
+implementation of Python without the GIL.
 
-"STM" stands for Software Transactional Memory, the technique used
+"STM" stands for Software `Transactional Memory`_, the technique used
 internally.  This page describes ``pypy-stm`` from the perspective of a
 user, describes work in progress, and finally gives references to more
 implementation details.
 
-This work was done mostly by Remi Meier and Armin Rigo.  Thanks to all
-donors for crowd-funding the work so far!  Please have a look at the
-`2nd call for donation`_.
+This work was done by Remi Meier and Armin Rigo.  Thanks to all donors
+for crowd-funding the work so far!  Please have a look at the `2nd call
+for donation`_.
 
+.. _`Transactional Memory`: http://en.wikipedia.org/wiki/Transactional_memory
 .. _`2nd call for donation`: http://pypy.org/tmdonate2.html
 
 
 Introduction
 ============
 
-``pypy-stm`` is a variant of the regular PyPy interpreter.  With caveats
-listed below, it should be in theory within 25%-50% slower than a
+``pypy-stm`` is a variant of the regular PyPy interpreter.  With caveats_
+listed below, it should be in theory within 20%-50% slower than a
 regular PyPy, comparing the JIT version in both cases.  It is called
 STM for Software Transactional Memory, which is the internal technique
 used (see `Reference to implementation details`_).
 
-What you get in exchange for this slow-down is that ``pypy-stm`` runs
-any multithreaded Python program on multiple CPUs at once.  Programs
-running two threads or more in parallel should ideally run faster than
-in a regular PyPy, either now or soon as issues are fixed.  In one way,
-that's all there is to it: this is a GIL-less Python, feel free to
-`download and try it`__.  However, the deeper idea behind the
-``pypy-stm`` project is to improve what is so far the state-of-the-art
-for using multiple CPUs, which for cases where separate processes don't
-work is done by writing explicitly multi-threaded programs.  Instead,
-``pypy-stm`` is pushing forward an approach to *hide* the threads, as
-described below in `atomic sections`_.
+The benefit is that the resulting ``pypy-stm`` can execute multiple
+threads of Python code in parallel.  Programs running two threads or
+more in parallel should ideally run faster than in a regular PyPy
+(either now, or soon as bugs are fixed).
 
+* ``pypy-stm`` is fully compatible with a GIL-based PyPy; you can use
+  it as a drop-in replacement and multithreaded programs will run on
+  multiple cores.
 
-.. __:
+* ``pypy-stm`` does not impose any special API to the user, but it
+  provides a new pure Python module called `transactional_memory`_ with
+  features to inspect the state or debug conflicts_ that prevent
+  parallelization.  This module can also be imported on top of a non-STM
+  PyPy or CPython.
 
-Current status
-==============
+* Building on top of the way the GIL is removed, we will talk
+  about `Atomic sections, Transactions, etc.: a better way to write
+  parallel programs`_.
+
+
+Getting Started
+===============
 
 **pypy-stm requires 64-bit Linux for now.**
 
 Development is done in the branch `stmgc-c7`_.  If you are only
-interested in trying it out, you can download a Ubuntu 12.04 binary
-here__ (``pypy-2.2.x-stm*.tar.bz2``; this version is a release mode,
-but not stripped of debug symbols).  The current version supports four
-"segments", which means that it will run up to four threads in parallel,
-in other words it is running a thread pool up to 4 threads emulating normal
-threads.
+interested in trying it out, you can download a Ubuntu binary here__
+(``pypy-2.3.x-stm*.tar.bz2``, Ubuntu 12.04-14.04; these versions are
+release mode, but not stripped of debug symbols).  The current version
+supports four "segments", which means that it will run up to four
+threads in parallel.
 
 To build a version from sources, you first need to compile a custom
-version of clang; we recommend downloading `llvm and clang like
-described here`__, but at revision 201645 (use ``svn co -r 201645 ...``
+version of clang(!); we recommend downloading `llvm and clang like
+described here`__, but at revision 201645 (use ``svn co -r 201645 <path>``
 for all checkouts).  Then apply all the patches in `this directory`__:
-they are fixes for the very extensive usage that pypy-stm does of a
-clang-only feature (without them, you get crashes of clang).  Then get
+they are fixes for a clang-only feature that hasn't been used so heavily
+in the past (without the patches, you get crashes of clang).  Then get
 the branch `stmgc-c7`_ of PyPy and run::
 
    rpython/bin/rpython -Ojit --stm pypy/goal/targetpypystandalone.py
 .. __: https://bitbucket.org/pypy/stmgc/src/default/c7/llvmfix/
 
 
-Caveats:
+.. _caveats:
 
-* So far, small examples work fine, but there are still a number of
-  bugs.  We're busy fixing them.
+Current status
+--------------
+
+* So far, small examples work fine, but there are still a few bugs.
+  We're busy fixing them as we find them; feel free to `report bugs`_.
 
 * Currently limited to 1.5 GB of RAM (this is just a parameter in
-  `core.h`__).  Memory overflows are not detected correctly, so may
-  cause segmentation faults.
+  `core.h`__).  Memory overflows are not correctly handled; they cause
+  segfaults.
 
-* The JIT warm-up time is abysmal (as opposed to the regular PyPy's,
-  which is "only" bad).  Moreover, you should run it with a command like
-  ``pypy-stm --jit trace_limit=60000 args...``; the default value of
-  6000 for ``trace_limit`` is currently too low (6000 should become
-  reasonable again as we improve).  Also, in order to produce machine
-  code, the JIT needs to enter a special single-threaded mode for now.
-  This all means that you *will* get very bad performance results if
-  your program doesn't run for *many* seconds for now.
+* The JIT warm-up time improved recently but is still bad.  In order to
+  produce machine code, the JIT needs to enter a special single-threaded
+  mode for now.  This means that you will get bad performance results if
+  your program doesn't run for several seconds, where *several* can mean
+  *many.*  When trying benchmarks, be sure to check that you have
+  reached the warmed state, i.e. the performance is not improving any
+  more.  This should be clear from the fact that as long as it's
+  producing more machine code, ``pypy-stm`` will run on a single core.
 
 * The GC is new; although clearly inspired by PyPy's regular GC, it
   misses a number of optimizations for now.  Programs allocating large
 * The STM system is based on very efficient read/write barriers, which
   are mostly done (their placement could be improved a bit in
   JIT-generated machine code).  But the overall bookkeeping logic could
-  see more improvements (see Statistics_ below).
-
-* You can use `atomic sections`_, but the most visible missing thing is
-  that you don't get reports about the "conflicts" you get.  This would
-  be the first thing that you need in order to start using atomic
-  sections more extensively.  Also, for now: for better results, try to
-  explicitly force a transaction break just before (and possibly after)
-  each large atomic section, with ``time.sleep(0)``.
+  see more improvements (see `Low-level statistics`_ below).
 
 * Forking the process is slow because the complete memory needs to be
-  copied manually right now.
+  copied manually.  A warning is printed to this effect.
 
-* Very long-running processes should eventually crash on an assertion
-  error because of a non-implemented overflow of an internal 29-bit
-  number, but this requires at the very least ten hours --- more
-  probably, several days or more.
+* Very long-running processes (on the order of days) will eventually
+  crash on an assertion error because of a non-implemented overflow of
+  an internal 29-bit number.
 
 .. _`report bugs`: https://bugs.pypy.org/
 .. __: https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/translator/stm/src_stm/stm/core.h
 
 
 
-Statistics
+User Guide
 ==========
+  
 
-When a non-main thread finishes, you get statistics printed to stderr,
-looking like that::
+Drop-in replacement
+-------------------
 
-      thread 0x7f73377fe600:
-          outside transaction          42182  0.506 s
-          run current                  85466  0.000 s
-          run committed                34262  3.178 s
-          run aborted write write       6982  0.083 s
-          run aborted write read         550  0.005 s
-          run aborted inevitable         388  0.010 s
-          run aborted other                0  0.000 s
-          wait free segment                0  0.000 s
-          wait write read                 78  0.027 s
-          wait inevitable                887  0.490 s
-          wait other                       0  0.000 s
-          bookkeeping                  51418  0.606 s
-          minor gc                    162970  1.135 s
-          major gc                         1  0.019 s
-          sync pause                   59173  1.738 s
-          spin loop                   129512  0.094 s
+Multithreaded, CPU-intensive Python programs should work unchanged on
+``pypy-stm``.  They will run using multiple CPU cores in parallel.
 
-The first number is a counter; the second number gives the associated
-time (the amount of real time that the thread was in this state; the sum
-of all the times should be equal to the total time between the thread's
-start and the thread's end).  The most important points are "run
-committed", which gives the amount of useful work, and "outside
-transaction", which should give the time spent e.g. in library calls
-(right now it seems to be a bit larger than that; to investigate).
-Everything else is overhead of various forms.  (Short-, medium- and
-long-term future work involves reducing this overhead :-)
+The existing semantics of the GIL (Global Interpreter Lock) are
+unchanged: although running on multiple cores in parallel, ``pypy-stm``
+gives the illusion that threads are run serially, with switches only
+occurring between bytecodes, not in the middle of them.  Programs can
+rely on this: using ``shared_list.append()/pop()`` or
+``shared_dict.setdefault()`` as synchronization mecanisms continues to
+work as expected.
 
-These statistics are not printed out for the main thread, for now.
+This works by internally considering the points where a standard PyPy or
+CPython would release the GIL, and replacing them with the boundaries of
+"transaction".  Like their database equivalent, multiple transactions
+can execute in parallel, but will commit in some serial order.  They
+appear to behave as if they were completely run in this serialization
+order.
 
 
 Atomic sections
-===============
+---------------
 
-While one of the goal of pypy-stm is to give a GIL-free but otherwise
-unmodified Python, the other goal is to push for a better way to use
-multithreading.  For this, you (as the Python programmer) get an API
-in the ``__pypy__.thread`` submodule:
+PyPy supports *atomic sections,* which are blocks of code which you want
+to execute without "releasing the GIL".  *This is experimental and may
+be removed in the future.*  In STM terms, this means blocks of code that
+are executed while guaranteeing that the transaction is not interrupted
+in the middle.
 
-* ``__pypy__.thread.atomic``: a context manager (i.e. you use it in
-  a ``with __pypy__.thread.atomic:`` statement).  It runs the whole
-  block of code without breaking the current transaction --- from
-  the point of view of a regular CPython/PyPy, this is equivalent to
-  saying that the GIL will not be released at all between the start and
-  the end of this block of code.
+Here is a usage example::
 
-The obvious usage is to use atomic blocks in the same way as one would
-use locks: to protect changes to some shared data, you do them in a
-``with atomic`` block, just like you would otherwise do them in a ``with
-mylock`` block after ``mylock = thread.allocate_lock()``.  This allows
-you not to care about acquiring the correct locks in the correct order;
-it is equivalent to having only one global lock.  This is how
-transactional memory is `generally described`__: as a way to efficiently
-execute such atomic blocks, running them in parallel while giving the
-illusion that they run in some serial order.
+    with __pypy__.thread.atomic:
+        assert len(lst1) == 10
+        x = lst1.pop(0)
+        lst1.append(x)
 
-.. __: http://en.wikipedia.org/wiki/Transactional_memory
+In this (bad) example, we are sure that the item popped off one end of
+the list is appened again at the other end atomically.  It means that
+another thread can run ``len(lst1)`` or ``x in lst1`` without any
+particular synchronization, and always see the same results,
+respectively ``10`` and ``True``.  It will never see the intermediate
+state where ``lst1`` only contains 9 elements.  Atomic sections are
+similar to re-entrant locks (they can be nested), but additionally they
+protect against the concurrent execution of *any* code instead of just
+code that happens to be protected by the same lock in other threads.
 
-However, the less obvious intended usage of atomic sections is as a
-wide-ranging replacement of explicit threads.  You can turn a program
-that is not multi-threaded at all into a program that uses threads
-internally, together with large atomic sections to keep the behavior
-unchanged.  This capability can be hidden in a library or in the
-framework you use; the end user's code does not need to be explicitly
-aware of using threads.  For a simple example of this, see
-`transaction.py`_ in ``lib_pypy``.  The idea is that if you have a
-program where the function ``f(key, value)`` runs on every item of some
-big dictionary, you can replace the loop with::
+Note that the notion of atomic sections is very strong. If you write
+code like this::
+
+    with __pypy__.thread.atomic:
+        time.sleep(10)
+
+then, if you think about it as if we had a GIL, you are executing a
+10-seconds-long atomic transaction without releasing the GIL at all.
+This prevents all other threads from progressing at all.  While it is
+not strictly true in ``pypy-stm``, the exact rules for when other
+threads can progress or not are rather complicated; you have to consider
+it likely that such a piece of code will eventually block all other
+threads anyway.
+
+Note that if you want to experiment with ``atomic``, you may have to add
+manually a transaction break just before the atomic block.  This is
+because the boundaries of the block are not guaranteed to be the
+boundaries of the transaction: the latter is at least as big as the
+block, but maybe bigger.  Therefore, if you run a big atomic block, it
+is a good idea to break the transaction just before.  This can be done
+e.g. by the hack of calling ``time.sleep(0)``.  (This may be fixed at
+some point.)
+
+There are also issues with the interaction of locks and atomic blocks.
+This can be seen if you write to files (which have locks), including
+with a ``print`` to standard output.  If one thread tries to acquire a
+lock while running in an atomic block, and another thread has got the
+same lock, then the former may fail with a ``thread.error``.  The reason
+is that "waiting" for some condition to become true --while running in
+an atomic block-- does not really make sense.  For now you can work
+around it by making sure that, say, all your prints are either in an
+``atomic`` block or none of them are.  (This kind of issue is
+theoretically hard to solve.)
+
+
+Locks
+-----
+
+**Not Implemented Yet**
+
+The thread module's locks have their basic semantic unchanged.  However,
+using them (e.g. in ``with my_lock:`` blocks) starts an alternative
+running mode, called `Software lock elision`_.  This means that PyPy
+will try to make sure that the transaction extends until the point where
+the lock is released, and if it succeeds, then the acquiring and
+releasing of the lock will be "elided".  This means that in this case,
+the whole transaction will technically not cause any write into the lock
+object --- it was unacquired before, and is still unacquired after the
+transaction.
+
+This is specially useful if two threads run ``with my_lock:`` blocks
+with the same lock.  If they each run a transaction that is long enough
+to contain the whole block, then all writes into the lock will be elided
+and the two transactions will not conflict with each other.  As usual,
+they will be serialized in some order: one of the two will appear to run
+before the other.  Simply, each of them executes an "acquire" followed
+by a "release" in the same transaction.  As explained above, the lock
+state goes from "unacquired" to "unacquired" and can thus be left
+unchanged.
+
+This approach can gracefully fail: unlike atomic sections, there is no
+guarantee that the transaction runs until the end of the block.  If you
+perform any input/output while you hold the lock, the transaction will
+end as usual just before the input/output operation.  If this occurs,
+then the lock elision mode is cancelled and the lock's "acquired" state
+is really written.
+
+Even if the lock is really acquired already, a transaction doesn't have
+to wait for it to become free again.  It can enter the elision-mode anyway
+and tentatively execute the content of the block.  It is only at the end,
+when trying to commit, that the thread will pause.  As soon as the real
+value stored in the lock is switched back to "unacquired", it can then
+proceed and attempt to commit its already-executed transaction (which
+can fail and abort and restart from the scratch, as usual).
+
+Note that this is all *not implemented yet,* but we expect it to work
+even if you acquire and release several locks.  The elision-mode
+transaction will extend until the first lock you acquired is released,
+or until the code performs an input/output or a wait operation (for
+example, waiting for another lock that is currently not free).  In the
+common case of acquiring several locks in nested order, they will all be
+elided by the same transaction.
+
+.. _`software lock elision`: https://www.repository.cam.ac.uk/handle/1810/239410
+
+
+Atomic sections, Transactions, etc.: a better way to write parallel programs
+----------------------------------------------------------------------------
+
+(This section is based on locks as we plan to implement them, but also
+works with the existing atomic sections.)
+
+In the cases where elision works, the block of code can run in parallel
+with other blocks of code *even if they are protected by the same lock.*
+You still get the illusion that the blocks are run sequentially.  This
+works even for multiple threads that run each a series of such blocks
+and nothing else, protected by one single global lock.  This is
+basically the Python application-level equivalent of what was done with
+the interpreter in ``pypy-stm``: while you think you are writing
+thread-unfriendly code because of this global lock, actually the
+underlying system is able to make it run on multiple cores anyway.
+
+This capability can be hidden in a library or in the framework you use;
+the end user's code does not need to be explicitly aware of using
+threads.  For a simple example of this, there is `transaction.py`_ in
+``lib_pypy``.  The idea is that you write, or already have, some program
+where the function ``f(key, value)`` runs on every item of some big
+dictionary, say::
+
+    for key, value in bigdict.items():
+        f(key, value)
+
+Then you simply replace the loop with::
 
     for key, value in bigdict.items():
         transaction.add(f, key, value)
     transaction.run()
 
 This code runs the various calls to ``f(key, value)`` using a thread
-pool, but every single call is done in an atomic section.  The end
-result is that the behavior should be exactly equivalent: you don't get
-any extra multithreading issue.
+pool, but every single call is executed under the protection of a unique
+lock.  The end result is that the behavior is exactly equivalent --- in
+fact it makes little sense to do it in this way on a non-STM PyPy or on
+CPython.  But on ``pypy-stm``, the various locked calls to ``f(key,
+value)`` can tentatively be executed in parallel, even if the observable
+result is as if they were executed in some serial order.
 
 This approach hides the notion of threads from the end programmer,
 including all the hard multithreading-related issues.  This is not the
 only requires that the end programmer identifies where this parallelism
 is likely to be found, and communicates it to the system, using for
 example the ``transaction.add()`` scheme.
-
+    
 .. _`transaction.py`: https://bitbucket.org/pypy/pypy/raw/stmgc-c7/lib_pypy/transaction.py
 .. _OpenMP: http://en.wikipedia.org/wiki/OpenMP
 
-==================
 
-Other APIs in pypy-stm:
+.. _`transactional_memory`:
 
-* ``__pypy__.thread.getsegmentlimit()``: return the number of "segments"
-  in this pypy-stm.  This is the limit above which more threads will not
-  be able to execute on more cores.  (Right now it is limited to 4 due
-  to inter-segment overhead, but should be increased in the future.  It
+API of transactional_memory
+---------------------------
+
+The new pure Python module ``transactional_memory`` runs on both CPython
+and PyPy, both with and without STM.  It contains:
+
+* ``getsegmentlimit()``: return the number of "segments" in
+  this pypy-stm.  This is the limit above which more threads will not be
+  able to execute on more cores.  (Right now it is limited to 4 due to
+  inter-segment overhead, but should be increased in the future.  It
   should also be settable, and the default value should depend on the
-  number of actual CPUs.)
+  number of actual CPUs.)  If STM is not available, this returns 1.
 
-* ``__pypy__.thread.exclusive_atomic``: same as ``atomic``, but
-  raises an exception if you attempt to nest it inside another
-  ``atomic``.
+* ``print_abort_info(minimum_time=0.0)``: debugging help.  Each thread
+  remembers the longest abort or pause it did because of cross-thread
+  contention_.  This function prints it to ``stderr`` if the time lost
+  is greater than ``minimum_time`` seconds.  The record is then
+  cleared, to make it ready for new events.  This function returns
+  ``True`` if it printed a report, and ``False`` otherwise.
 
-* ``__pypy__.thread.signals_enabled``: a context manager that runs
-  its block with signals enabled.  By default, signals are only
-  enabled in the main thread; a non-main thread will not receive
-  signals (this is like CPython).  Enabling signals in non-main threads
-  is useful for libraries where threads are hidden and the end user is
-  not expecting his code to run elsewhere than in the main thread.
 
-Note that all of this API is (or will be) implemented in a regular PyPy
-too: for example, ``with atomic`` will simply mean "don't release the
-GIL" and ``getsegmentlimit()`` will return 1.
+API of __pypy__.thread
+----------------------
 
-==================
+The ``__pypy__.thread`` submodule is a built-in module of PyPy that
+contains a few internal built-in functions used by the
+``transactional_memory`` module, plus the following:
+    
+* ``__pypy__.thread.atomic``: a context manager to run a block in
+  fully atomic mode, without "releasing the GIL".  (May be eventually
+  removed?)
+
+* ``__pypy__.thread.signals_enabled``: a context manager that runs its
+  block with signals enabled.  By default, signals are only enabled in
+  the main thread; a non-main thread will not receive signals (this is
+  like CPython).  Enabling signals in non-main threads is useful for
+  libraries where threads are hidden and the end user is not expecting
+  his code to run elsewhere than in the main thread.
+
+
+.. _contention:
+
+Conflicts
+---------
+
+Based on Software Transactional Memory, the ``pypy-stm`` solution is
+prone to "conflicts".  To repeat the basic idea, threads execute their code
+speculatively, and at known points (e.g. between bytecodes) they
+coordinate with each other to agree on which order their respective
+actions should be "committed", i.e. become globally visible.  Each
+duration of time between two commit-points is called a transaction.
+
+A conflict occurs when there is no consistent ordering.  The classical
+example is if two threads both tried to change the value of the same
+global variable.  In that case, only one of them can be allowed to
+proceed, and the other one must be either paused or aborted (restarting
+the transaction).  If this occurs too often, parallelization fails.
+
+How much actual parallelization a multithreaded program can see is a bit
+subtle.  Basically, a program not using ``__pypy__.thread.atomic`` or
+eliding locks, or doing so for very short amounts of time, will
+parallelize almost freely (as long as it's not some artificial example
+where, say, all threads try to increase the same global counter and do
+nothing else).
+
+However, using if the program requires longer transactions, it comes
+with less obvious rules.  The exact details may vary from version to
+version, too, until they are a bit more stabilized.  Here is an
+overview.
+
+Parallelization works as long as two principles are respected.  The
+first one is that the transactions must not *conflict* with each other.
+The most obvious sources of conflicts are threads that all increment a
+global shared counter, or that all store the result of their
+computations into the same list --- or, more subtly, that all ``pop()``
+the work to do from the same list, because that is also a mutation of
+the list.  (It is expected that some STM-aware library will eventually
+be designed to help with conflict problems, like a STM-aware queue.)
+
+A conflict occurs as follows: when a transaction commits (i.e. finishes
+successfully) it may cause other transactions that are still in progress
+to abort and retry.  This is a waste of CPU time, but even in the worst
+case senario it is not worse than a GIL, because at least one
+transaction succeeds (so we get at worst N-1 CPUs doing useless jobs and
+1 CPU doing a job that commits successfully).
+
+Conflicts do occur, of course, and it is pointless to try to avoid them
+all.  For example they can be abundant during some warm-up phase.  What
+is important is to keep them rare enough in total.
+
+Another issue is that of avoiding long-running so-called "inevitable"
+transactions ("inevitable" is taken in the sense of "which cannot be
+avoided", i.e. transactions which cannot abort any more).  Transactions
+like that should only occur if you use ``__pypy__.thread.atomic``,
+generally become of I/O in atomic blocks.  They work, but the
+transaction is turned inevitable before the I/O is performed.  For all
+the remaining execution time of the atomic block, they will impede
+parallel work.  The best is to organize the code so that such operations
+are done completely outside ``__pypy__.thread.atomic``.
+
+(This is related to the fact that blocking I/O operations are
+discouraged with Twisted, and if you really need them, you should do
+them on their own separate thread.)
+
+In case of lock elision, we don't get long-running inevitable
+transactions, but a different problem can occur: doing I/O cancels lock
+elision, and the lock turns into a real lock, preventing other threads
+from committing if they also need this lock.  (More about it when lock
+elision is implemented and tested.)
+
+
+
+Implementation
+==============
+
+XXX this section mostly empty for now
+
+
+Low-level statistics
+--------------------
+
+When a non-main thread finishes, you get low-level statistics printed to
+stderr, looking like that::
+
+      thread 0x7f73377fe600:
+          outside transaction          42182    0.506 s
+          run current                  85466    0.000 s
+          run committed                34262    3.178 s
+          run aborted write write       6982    0.083 s
+          run aborted write read         550    0.005 s
+          run aborted inevitable         388    0.010 s
+          run aborted other                0    0.000 s
+          wait free segment                0    0.000 s
+          wait write read                 78    0.027 s
+          wait inevitable                887    0.490 s
+          wait other                       0    0.000 s
+          sync commit soon                 1    0.000 s
+          bookkeeping                  51418    0.606 s
+          minor gc                    162970    1.135 s
+          major gc                         1    0.019 s
+          sync pause                   59173    1.738 s
+          longest recordered marker          0.000826 s
+          "File "x.py", line 5, in f"
+
+On each line, the first number is a counter, and the second number gives
+the associated time --- the amount of real time that the thread was in
+this state.  The sum of all the times should be equal to the total time
+between the thread's start and the thread's end.  The most important
+points are "run committed", which gives the amount of useful work, and
+"outside transaction", which should give the time spent e.g. in library
+calls (right now it seems to be larger than that; to investigate).  The
+various "run aborted" and "wait" entries are time lost due to
+conflicts_.  Everything else is overhead of various forms.  (Short-,
+medium- and long-term future work involves reducing this overhead :-)
+
+The last two lines are special; they are an internal marker read by
+``transactional_memory.print_abort_info()``.
+
+These statistics are not printed out for the main thread, for now.
 
 
 Reference to implementation details
-===================================
+-----------------------------------
 
 The core of the implementation is in a separate C library called stmgc_,
 in the c7_ subdirectory.  Please see the `README.txt`_ for more
 .. __: https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/translator/stm/src_stm/stmgcintf.c
 .. __: https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/jit/backend/llsupport/stmrewrite.py
 .. __: https://bitbucket.org/pypy/pypy/raw/stmgc-c7/rpython/jit/backend/x86/assembler.py
+
+
+
+See also
+========
+
+See also
+https://bitbucket.org/pypy/pypy/raw/default/pypy/doc/project-ideas.rst
+(section about STM).
+
+
+.. include:: _ref.txt

File pypy/doc/whatsnew-2.3.0.rst

 
 .. branch: fix-tpname
 Changes hacks surrounding W_TypeObject.name to match CPython's tp_name
+
+.. branch: tkinter_osx_packaging
+OS/X specific header path

File pypy/doc/whatsnew-head.rst

 =======================
 
 .. this is a revision shortly after release-2.3.x
-.. startrev: ec864bd08d50
+.. startrev: b2cc67adbaad

File pypy/interpreter/test/test_app_main.py

 from rpython.tool.udir import udir
 from contextlib import contextmanager
 from pypy.conftest import pypydir
+from pypy.module.sys.version import PYPY_VERSION
+from lib_pypy._pypy_interact import irc_header
+
+is_release = PYPY_VERSION[3] == "final"
+
 
 banner = sys.version.splitlines()[0]
 
         child = self.spawn([])
         child.expect('Python ')   # banner
         child.expect('>>> ')      # prompt
+        if is_release:
+            assert irc_header not in child.before
+        else:
+            assert irc_header in child.before
         child.sendline('[6*7]')
         child.expect(re.escape('[42]'))
         child.sendline('def f(x):')

File pypy/module/fcntl/interp_fcntl.py

 fcntl_int = external('fcntl', [rffi.INT, rffi.INT, rffi.INT], rffi.INT)
 fcntl_str = external('fcntl', [rffi.INT, rffi.INT, rffi.CCHARP], rffi.INT)
 fcntl_flock = external('fcntl', [rffi.INT, rffi.INT, _flock], rffi.INT)
-ioctl_int = external('ioctl', [rffi.INT, rffi.INT, rffi.INT], rffi.INT)
-ioctl_str = external('ioctl', [rffi.INT, rffi.INT, rffi.CCHARP], rffi.INT)
+ioctl_int = external('ioctl', [rffi.INT, rffi.UINT, rffi.INT], rffi.INT)
+ioctl_str = external('ioctl', [rffi.INT, rffi.UINT, rffi.CCHARP], rffi.INT)
 
 has_flock = cConfig.has_flock
 if has_flock:

File pypy/module/fcntl/test/test_fcntl.py

             os.unlink(i)
 
 class AppTestFcntl:
-    spaceconfig = dict(usemodules=('fcntl', 'array', 'struct', 'termios', 'select', 'rctime'))
+    spaceconfig = dict(usemodules=('fcntl', 'array', 'struct', 'termios',
+                                   'select', 'rctime'))
+
     def setup_class(cls):
         tmpprefix = str(udir.ensure('test_fcntl', dir=1).join('tmp_'))
         cls.w_tmp = cls.space.wrap(tmpprefix)
             os.close(mfd)
             os.close(sfd)
 
+    def test_ioctl_signed_unsigned_code_param(self):
+        import fcntl
+        import os
+        import pty
+        import struct
+        import termios
+
+        mfd, sfd = pty.openpty()
+        try:
+            if termios.TIOCSWINSZ < 0:
+                set_winsz_opcode_maybe_neg = termios.TIOCSWINSZ
+                set_winsz_opcode_pos = termios.TIOCSWINSZ & 0xffffffffL
+            else:
+                set_winsz_opcode_pos = termios.TIOCSWINSZ
+                set_winsz_opcode_maybe_neg, = struct.unpack("i",
+                        struct.pack("I", termios.TIOCSWINSZ))
+
+            our_winsz = struct.pack("HHHH",80,25,0,0)
+            # test both with a positive and potentially negative ioctl code
+            new_winsz = fcntl.ioctl(mfd, set_winsz_opcode_pos, our_winsz)
+            new_winsz = fcntl.ioctl(mfd, set_winsz_opcode_maybe_neg, our_winsz)
+        finally:
+            os.close(mfd)
+            os.close(sfd)
+
     def test_large_flag(self):
         import sys
         if any(plat in sys.platform

File pypy/module/operator/test/test_operator.py

         import operator
         assert operator.index(42) == 42
         assert operator.__index__(42) == 42
-        raises(TypeError, operator.index, "abc")
+        exc = raises(TypeError, operator.index, "abc")
+        assert str(exc.value) == "'str' object cannot be interpreted as an index"

File pypy/objspace/descroperation.py

     l = ["space.isinstance_w(w_result, %s)" % x
                 for x in checkerspec]
     checker = " or ".join(l)
+    if targetname == 'index':
+        msg = "'%%T' object cannot be interpreted as an index"
+    else:
+        msg = "unsupported operand type for %(targetname)s(): '%%T'"
+    msg = msg % locals()
     source = """if 1:
         def %(targetname)s(space, w_obj):
             w_impl = space.lookup(w_obj, %(specialname)r)
             if w_impl is None:
                 raise oefmt(space.w_TypeError,
-                            "unsupported operand type for %(targetname)s(): "
-                            "'%%T'", w_obj)
+                            %(msg)r,
+                            w_obj)
             w_result = space.get_and_call_function(w_impl, w_obj)
 
             if %(checker)s:

File pypy/tool/release/force-builds.py

     'own-linux-x86-32',
     'own-linux-x86-64',
     'own-linux-armhf',
+    'own-win-x86-32',
 #    'own-macosx-x86-32',
 #    'pypy-c-app-level-linux-x86-32',
 #    'pypy-c-app-level-linux-x86-64',
 #    'pypy-c-stackless-app-level-linux-x86-32',
-    'pypy-c-app-level-win-x86-32',
+#    'pypy-c-app-level-win-x86-32',
     'pypy-c-jit-linux-x86-32',
     'pypy-c-jit-linux-x86-64',
     'pypy-c-jit-macosx-x86-64',

File rpython/config/translationoption.py

 
 if sys.platform.startswith("linux"):
     DEFL_ROOTFINDER_WITHJIT = "asmgcc"
-    ROOTFINDERS = ["n/a", "shadowstack", "asmgcc"]
-elif compiler.name == 'msvc':    
-    DEFL_ROOTFINDER_WITHJIT = "shadowstack"
-    ROOTFINDERS = ["n/a", "shadowstack"]
 else:
     DEFL_ROOTFINDER_WITHJIT = "shadowstack"
-    ROOTFINDERS = ["n/a", "shadowstack", "asmgcc"]
 
 IS_64_BITS = sys.maxint > 2147483647
 
                default=IS_64_BITS, cmdline="--gcremovetypeptr"),
     ChoiceOption("gcrootfinder",
                  "Strategy for finding GC Roots (framework GCs only)",
-                 ROOTFINDERS,
+                 ["n/a", "shadowstack", "asmgcc"],
                  "shadowstack",
                  cmdline="--gcrootfinder",
                  requires={
     # if we have specified strange inconsistent settings.
     config.translation.gc = config.translation.gc
 
-    # disallow asmgcc on OS/X
+    # disallow asmgcc on OS/X and on Win32
     if config.translation.gcrootfinder == "asmgcc":
-        assert sys.platform != "darwin"
+        assert sys.platform != "darwin", "'asmgcc' not supported on OS/X"
+        assert sys.platform != "win32",  "'asmgcc' not supported on Win32"
 
 # ----------------------------------------------------------------
 

File rpython/rlib/runicode.py

File contents unchanged.

File rpython/rlib/test/test_runicode.py

File contents unchanged.

File rpython/translator/c/gcc/trackgcroot.py

 
         # trim: instructions with no framesize are removed from self.insns,
         # and from the 'previous_insns' lists
-        assert hasattr(self.insns[0], 'framesize')
-        old = self.insns[1:]
-        del self.insns[1:]
-        for insn in old:
+        if 0:    # <- XXX disabled because it seems bogus, investigate more
+          assert hasattr(self.insns[0], 'framesize')
+          old = self.insns[1:]
+          del self.insns[1:]
+          for insn in old:
             if hasattr(insn, 'framesize'):
                 self.insns.append(insn)
                 insn.previous_insns = [previnsn for previnsn in insn.previous_insns

File rpython/translator/c/genc.py

File contents unchanged.

File rpython/translator/c/src/asm.c

 #  include "src/asm_ppc.c"
 #endif
 
-#if defined(MS_WINDOWS) && defined(_MSC_VER)
+#if defined(_MSC_VER)
 #  include "src/asm_msvc.c"
 #endif

File rpython/translator/c/src/asm_msvc.c

 #ifdef PYPY_X86_CHECK_SSE2
 #include <intrin.h>
+#include <stdio.h>
 void pypy_x86_check_sse2(void)
 {
     int features;