Armin Rigo avatar Armin Rigo committed f8dc49d

Reorganize this document, making it clear that all GCs apart from
minimark have been removed.

Comments (0)

Files changed (1)

pypy/doc/garbage_collection.rst

 .. contents::
 
 
-Introduction
-============
-
-The overview and description of our garbage collection strategy and
-framework can be found in the `EU-report on this topic`_.  Please refer
-to that file for an old, but still more or less accurate, description.
-The present document describes the specific garbage collectors that we
-wrote in our framework.
-
-.. _`EU-report on this topic`: http://codespeak.net/pypy/extradoc/eu-report/D07.1_Massive_Parallelism_and_Translation_Aspects-2007-02-28.pdf
-
-
 Garbage collectors currently written for the GC framework
 =========================================================
 
 For more details, see the `overview of command line options for
 translation`_.
 
-The following overview is written in chronological order, so the "best"
-GC (which is the default when translating) is the last one below.
-
 .. _`overview of command line options for translation`: config/commandline.html#translation
 
-Mark and Sweep
---------------
-
-Classical Mark and Sweep collector.  Also contained a lot of experimental
-and half-unmaintained features.  Was removed.
-
-Semispace copying collector
----------------------------
-
-Two arenas of equal size, with only one arena in use and getting filled
-with new objects.  When the arena is full, the live objects are copied
-into the other arena using Cheney's algorithm.  The old arena is then
-cleared.  See `rpython/memory/gc/semispace.py`_.
-
-On Unix the clearing is done by reading ``/dev/zero`` into the arena,
-which is extremely memory efficient at least on Linux: it lets the
-kernel free the RAM that the old arena used and replace it all with
-allocated-on-demand memory.
-
-The size of each semispace starts at 8MB but grows as needed when the
-amount of objects alive grows.
-
-Generational GC
----------------
-
-This is a two-generations GC.  See `rpython/memory/gc/generation.py`_.
-
-It is implemented as a subclass of the Semispace copying collector.  It
-adds a nursery, which is a chunk of the current semispace.  Its size is
-computed to be half the size of the CPU Level 2 cache.  Allocations fill
-the nursery, and when it is full, it is collected and the objects still
-alive are moved to the rest of the current semispace.
-
-The idea is that it is very common for objects to die soon after they
-are created.  Generational GCs help a lot in this case, particularly if
-the amount of live objects really manipulated by the program fits in the
-Level 2 cache.  Moreover, the semispaces fill up much more slowly,
-making full collections less frequent.
-
-Hybrid GC
----------
-
-This is a three-generations GC.
-
-It is implemented as a subclass of the Generational GC.  The Hybrid GC
-can handle both objects that are inside and objects that are outside the
-semispaces ("external").  The external objects are not moving and
-collected in a mark-and-sweep fashion.  Large objects are allocated as
-external objects to avoid costly moves.  Small objects that survive for
-a long enough time (several semispace collections) are also made
-external so that they stop moving.
-
-This is coupled with a segregation of the objects in three generations.
-Each generation is collected much less often than the previous one.  The
-division of the generations is slightly more complicated than just
-nursery / semispace / external; see the diagram at the start of the
-source code, in `rpython/memory/gc/hybrid.py`_.
-
-Mark & Compact GC
------------------
-
-Killed in trunk.  The following documentation is for historical purposes
-only.
-
-Inspired, at least partially, by Squeak's garbage collector, this is a
-single-arena GC in which collection compacts the objects in-place.  The
-main point of this GC is to save as much memory as possible (to be not
-worse than the Semispace), but without the peaks of double memory usage
-during collection.
-
-Unlike the Semispace GC, collection requires a number of passes over the
-data.  This makes collection quite slower.  Future improvements could be
-to add a nursery to Mark & Compact in order to mitigate this issue.
-
-During a collection, we reuse the space in-place if it is still large
-enough.  If not, we need to allocate a new, larger space, and move the
-objects there; however, this move is done chunk by chunk, and chunks are
-cleared (i.e. returned to the OS) as soon as they have been moved away.
-This means that (from the point of view of the OS) a collection will
-never cause an important temporary growth of total memory usage.
-
-More precisely, a collection is triggered when the space contains more
-than N*M bytes, where N is the number of bytes alive after the previous
-collection and M is a constant factor, by default 1.5.  This guarantees
-that the total memory usage of the program never exceeds 1.5 times the
-total size of its live objects.
-
-The objects themselves are quite compact: they are allocated next to
-each other in the heap, separated by a GC header of only one word (4
-bytes on 32-bit platforms) and possibly followed by up to 3 bytes of
-padding for non-word-sized objects (e.g. strings).  There is a small
-extra memory usage during collection: an array containing 2 bytes per
-surviving object is needed to make a backup of (half of) the surviving
-objects' header, in order to let the collector store temporary relation
-information in the regular headers.
 
 Minimark GC
 -----------
 
-This is a simplification and rewrite of the ideas from the Hybrid GC.
+This is a simplification and rewrite of the ideas from the old Hybrid GC.
 It uses a nursery for the young objects, and mark-and-sweep for the old
 objects.  This is a moving GC, but objects may only move once (from
 the nursery to the old stage).
 
-The main difference with the Hybrid GC is that the mark-and-sweep
-objects (the "old stage") are directly handled by the GC's custom
-allocator, instead of being handled by malloc() calls.  The gain is that
+The non-nursery but small objects (the "old stage") are directly handled
+by the GC's custom allocator; there are not handled by malloc() calls.
+The gain is that
 it is then possible, during a major collection, to walk through all old
 generation objects without needing to store a list of pointers to them.
-So as a first approximation, when compared to the Hybrid GC, the
-Minimark GC saves one word of memory per old object.
+(So as a first approximation, when compared to the Hybrid GC, the
+Minimark GC saves one word of memory per old object.)
 
 There are :ref:`a number of environment variables
 <minimark-environment-variables>` that can be tweaked to influence the
 calling `rgc.progress_through_finalizer_queue()`.  (This safe point is
 in the interpreter main loop between two bytecodes, in PyPy.)
 
+
+Older GCs of historical interest
+================================
+
+The overview and description of our garbage collection strategy and
+framework can be found in the `EU-report on this topic`_.  Please refer
+to that file for a description.  It is only of historical interest
+because all corresponding GCs have been outdated and removed in the
+current PyPy repository.
+
+.. _`EU-report on this topic`: http://codespeak.net/pypy/extradoc/eu-report/D07.1_Massive_Parallelism_and_Translation_Aspects-2007-02-28.pdf
+
+
+Mark and Sweep
+--------------
+
+Classical Mark and Sweep collector.  Also contained a lot of experimental
+and half-unmaintained features.
+
+Semispace copying collector
+---------------------------
+
+Two arenas of equal size, with only one arena in use and getting filled
+with new objects.  When the arena is full, the live objects are copied
+into the other arena using Cheney's algorithm.  The old arena is then
+cleared.  Used to be in ``rpython/memory/gc/semispace.py``.
+
+On Unix the clearing is done by reading ``/dev/zero`` into the arena,
+which is extremely memory efficient at least on Linux: it lets the
+kernel free the RAM that the old arena used and replace it all with
+allocated-on-demand memory.
+
+The size of each semispace starts at 8MB but grows as needed when the
+amount of objects alive grows.
+
+Generational GC
+---------------
+
+This is a two-generations GC.  Used to be in
+``rpython/memory/gc/generation.py``.
+
+It is implemented as a subclass of the Semispace copying collector.  It
+adds a nursery, which is a chunk of the current semispace.  Its size is
+computed to be half the size of the CPU Level 2 cache.  Allocations fill
+the nursery, and when it is full, it is collected and the objects still
+alive are moved to the rest of the current semispace.
+
+The idea is that it is very common for objects to die soon after they
+are created.  Generational GCs help a lot in this case, particularly if
+the amount of live objects really manipulated by the program fits in the
+Level 2 cache.  Moreover, the semispaces fill up much more slowly,
+making full collections less frequent.
+
+Hybrid GC
+---------
+
+This is a three-generations GC.
+
+It is implemented as a subclass of the Generational GC.  The Hybrid GC
+can handle both objects that are inside and objects that are outside the
+semispaces ("external").  The external objects are not moving and
+collected in a mark-and-sweep fashion.  Large objects are allocated as
+external objects to avoid costly moves.  Small objects that survive for
+a long enough time (several semispace collections) are also made
+external so that they stop moving.
+
+This is coupled with a segregation of the objects in three generations.
+Each generation is collected much less often than the previous one.  The
+division of the generations is slightly more complicated than just
+nursery / semispace / external; see the diagram at the start of the
+source code, which used to be in ``rpython/memory/gc/hybrid.py``.
+
+Mark & Compact GC
+-----------------
+
+Inspired, at least partially, by Squeak's garbage collector, this is a
+single-arena GC in which collection compacts the objects in-place.  The
+main point of this GC is to save as much memory as possible (to be not
+worse than the Semispace), but without the peaks of double memory usage
+during collection.
+
+Unlike the Semispace GC, collection requires a number of passes over the
+data.  This makes collection quite slower.  Future improvements could be
+to add a nursery to Mark & Compact in order to mitigate this issue.
+
+During a collection, we reuse the space in-place if it is still large
+enough.  If not, we need to allocate a new, larger space, and move the
+objects there; however, this move is done chunk by chunk, and chunks are
+cleared (i.e. returned to the OS) as soon as they have been moved away.
+This means that (from the point of view of the OS) a collection will
+never cause an important temporary growth of total memory usage.
+
+More precisely, a collection is triggered when the space contains more
+than N*M bytes, where N is the number of bytes alive after the previous
+collection and M is a constant factor, by default 1.5.  This guarantees
+that the total memory usage of the program never exceeds 1.5 times the
+total size of its live objects.
+
+The objects themselves are quite compact: they are allocated next to
+each other in the heap, separated by a GC header of only one word (4
+bytes on 32-bit platforms) and possibly followed by up to 3 bytes of
+padding for non-word-sized objects (e.g. strings).  There is a small
+extra memory usage during collection: an array containing 2 bytes per
+surviving object is needed to make a backup of (half of) the surviving
+objects' header, in order to let the collector store temporary relation
+information in the regular headers.
+
+
 .. include:: _ref.txt
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.