pypy / pypy / doc / garbage_collection.rst

Full commit
David Malcolm 1e46012 

Armin Rigo 08e8a5d 

David Malcolm 1e46012 

Armin Rigo 08e8a5d 

David Malcolm 1e46012 

Carl Friedrich B… 07acca3 
David Malcolm 1e46012 

Carl Friedrich B… 07acca3 
David Malcolm 1e46012 

Carl Friedrich B… 07acca3 
David Malcolm 1e46012 

Carl Friedrich B… 07acca3 
David Malcolm 1e46012 

Carl Friedrich B… 07acca3 
David Malcolm 1e46012 
Armin Rigo 08e8a5d 

Stefano Rivera 565fba5 

Armin Rigo eaba7ad 
lac 05b60c2 
Armin Rigo eaba7ad 

Carl Friedrich B… e88a711 
Armin Rigo eaba7ad 

Carl Friedrich B… 541308f 
Garbage Collection in PyPy

.. contents::


The overview and description of our garbage collection strategy and
framework can be found in the `EU-report on this topic`_.  Please refer
to that file for an old, but still more or less accurate, description.
The present document describes the specific garbage collectors that we
wrote in our framework.

.. _`EU-report on this topic`:

Garbage collectors currently written for the GC framework

Reminder: to select which GC you want to include in a translated
RPython program, use the ``--gc=NAME`` option of ````.
For more details, see the `overview of command line options for

The following overview is written in chronological order, so the "best"
GC (which is the default when translating) is the last one below.

.. _`overview of command line options for translation`: config/commandline.html#translation

Mark and Sweep

Classical Mark and Sweep collector.  Also contains a lot of experimental
and half-unmaintained features.  See `pypy/rpython/memory/gc/`_.

Semispace copying collector

Two arenas of equal size, with only one arena in use and getting filled
with new objects.  When the arena is full, the live objects are copied
into the other arena using Cheney's algorithm.  The old arena is then
cleared.  See `pypy/rpython/memory/gc/`_.

On Unix the clearing is done by reading ``/dev/zero`` into the arena,
which is extremely memory efficient at least on Linux: it lets the
kernel free the RAM that the old arena used and replace it all with
allocated-on-demand memory.

The size of each semispace starts at 8MB but grows as needed when the
amount of objects alive grows.

Generational GC

This is a two-generations GC.  See `pypy/rpython/memory/gc/`_.

It is implemented as a subclass of the Semispace copying collector.  It
adds a nursery, which is a chunk of the current semispace.  Its size is
computed to be half the size of the CPU Level 2 cache.  Allocations fill
the nursery, and when it is full, it is collected and the objects still
alive are moved to the rest of the current semispace.

The idea is that it is very common for objects to die soon after they
are created.  Generational GCs help a lot in this case, particularly if
the amount of live objects really manipulated by the program fits in the
Level 2 cache.  Moreover, the semispaces fill up much more slowly,
making full collections less frequent.

Hybrid GC

This is a three-generations GC.

It is implemented as a subclass of the Generational GC.  The Hybrid GC
can handle both objects that are inside and objects that are outside the
semispaces ("external").  The external objects are not moving and
collected in a mark-and-sweep fashion.  Large objects are allocated as
external objects to avoid costly moves.  Small objects that survive for
a long enough time (several semispace collections) are also made
external so that they stop moving.

This is coupled with a segregation of the objects in three generations.
Each generation is collected much less often than the previous one.  The
division of the generations is slightly more complicated than just
nursery / semispace / external; see the diagram at the start of the
source code, in `pypy/rpython/memory/gc/`_.

Mark & Compact GC

Inspired, at least partially, by Squeak's garbage collector, this is a
single-arena GC in which collection compacts the objects in-place.  The
main point of this GC is to save as much memory as possible (to be not
worse than the Semispace), but without the peaks of double memory usage
during collection.

Unlike the Semispace GC, collection requires a number of passes over the
data.  This makes collection quite slower.  Future improvements could be
to add a nursery to Mark & Compact in order to mitigate this issue.

During a collection, we reuse the space in-place if it is still large
enough.  If not, we need to allocate a new, larger space, and move the
objects there; however, this move is done chunk by chunk, and chunks are
cleared (i.e. returned to the OS) as soon as they have been moved away.
This means that (from the point of view of the OS) a collection will
never cause an important temporary growth of total memory usage.

More precisely, a collection is triggered when the space contains more
than N*M bytes, where N is the number of bytes alive after the previous
collection and M is a constant factor, by default 1.5.  This guarantees
that the total memory usage of the program never exceeds 1.5 times the
total size of its live objects.

The objects themselves are quite compact: they are allocated next to
each other in the heap, separated by a GC header of only one word (4
bytes on 32-bit platforms) and possibly followed by up to 3 bytes of
padding for non-word-sized objects (e.g. strings).  There is a small
extra memory usage during collection: an array containing 2 bytes per
surviving object is needed to make a backup of (half of) the surviving
objects' header, in order to let the collector store temporary relation
information in the regular headers.

More details are available as comments at the start of the source
in `pypy/rpython/memory/gc/`_.

Minimark GC

This is a simplification and rewrite of the ideas from the Hybrid GC.
It uses a nursery for the young objects, and mark-and-sweep for the old
objects.  This is a moving GC, but objects may only move once (from
the nursery to the old stage).

The main difference with the Hybrid GC is that the mark-and-sweep
objects (the "old stage") are directly handled by the GC's custom
allocator, instead of being handled by malloc() calls.  The gain is that
it is then possible, during a major collection, to walk through all old
generation objects without needing to store a list of pointers to them.
So as a first approximation, when compared to the Hybrid GC, the
Minimark GC saves one word of memory per old object.

There are :ref:`a number of environment variables
<minimark-environment-variables>` that can be tweaked to influence the
GC.  (Their default value should be ok for most usages.)

In more detail:

- The small newly malloced objects are allocated in the nursery (case 1).
  All objects living in the nursery are "young".

- The big objects are always handled directly by the system malloc().
  But the big newly malloced objects are still "young" when they are
  allocated (case 2), even though they don't live in the nursery.

- When the nursery is full, we do a minor collection, i.e. we find
  which "young" objects are still alive (from cases 1 and 2).  The
  "young" flag is then removed.  The surviving case 1 objects are moved
  to the old stage. The dying case 2 objects are immediately freed.

- The old stage is an area of memory containing old (small) objects.  It
  is handled by `pypy/rpython/memory/gc/`_.  It is organized
  as "arenas" of 256KB or 512KB, subdivided into "pages" of 4KB or 8KB.
  Each page can either be free, or contain small objects of all the same
  size.  Furthermore at any point in time each object location can be
  either allocated or freed.  The basic design comes from ``obmalloc.c``
  from CPython (which itself comes from the same source as the Linux
  system malloc()).

- New objects are added to the old stage at every minor collection.
  Immediately after a minor collection, when we reach some threshold, we
  trigger a major collection.  This is the mark-and-sweep step.  It walks
  over *all* objects (mark), and then frees some fraction of them (sweep).
  This means that the only time when we want to free objects is while
  walking over all of them; we never ask to free an object given just its
  address.  This allows some simplifications and memory savings when
  compared to ``obmalloc.c``.

- As with all generational collectors, this GC needs a write barrier to
  record which old objects have a reference to young objects.

- Additionally, we found out that it is useful to handle the case of
  big arrays specially: when we allocate a big array (with the system
  malloc()), we reserve a small number of bytes before.  When the array
  grows old, we use the extra bytes as a set of bits.  Each bit
  represents 128 entries in the array.  Whenever the write barrier is
  called to record a reference from the Nth entry of the array to some
  young object, we set the bit number ``(N/128)`` to 1.  This can
  considerably speed up minor collections, because we then only have to
  scan 128 entries of the array instead of all of them.

- As usual, we need special care about weak references, and objects with
  finalizers.  Weak references are allocated in the nursery, and if they
  survive they move to the old stage, as usual for all objects; the
  difference is that the reference they contain must either follow the
  object, or be set to NULL if the object dies.  And the objects with
  finalizers, considered rare enough, are immediately allocated old to
  simplify the design.  In particular their ``__del__`` method can only
  be called just after a major collection.

- The objects move once only, so we can use a trick to implement id()
  and hash().  If the object is not in the nursery, it won't move any
  more, so its id() and hash() are the object's address, cast to an
  integer.  If the object is in the nursery, and we ask for its id()
  or its hash(), then we pre-reserve a location in the old stage, and
  return the address of that location.  If the object survives the
  next minor collection, we move it there, and so its id() and hash()
  are preserved.  If the object dies then the pre-reserved location
  becomes free garbage, to be collected at the next major collection.

.. include:: _ref.txt