extradoc / planning / gc.txt

Diff from to

planning/gc.txt

-Card marking GC for PyPy
-========================
-
-
-UPDATE: partially done in the minimark GC: large arrays are
-allocated with a few extra bytes containing card marker bits.
-
-
-
-With a generational GC one needs to keep track of references from the old
-generation to the young one using a write barrier.  Currently this is
-implemented with a list of old objects that contain pointers to young objects.
-Then, when the nursery is collected each remembered old object is scanned for
-any pointers it contains to young objects, and those are promoted out of the
-nursery.  The problem with this is exceptionally large objects with pointers
-from the old generation to the young one:
-    
-    def main():
-        objs = []
-        for i in xrange(10000000):
-            objs.append(i)
-            # A bunch of stuff here.
-
-If that loop does enough things on each iteration to force a garbage collection
-then every time through the loop ``objs`` will contain a pointer to a young
-object (``i``) and will be scanned, in it's entirety for young pointers.  This
-results in the loop taking quadratic time, where it ought to be linear.  The
-solution to this is a strategy named card marking.
-
-In a card marking generational collector, instead of storing a list of old
-objects, the heap is partitioned into sections, called cards (generally these
-are smaller than a page of memory), and the write barrier marks the card for
-the region of memory that contains the pointer.  Then, during collection
-instead of scanning objects for pointers, the regions of memory identified by
-marked cards are scanned.  This bounds the amount of space that must be scanned
-by the size of a card, rather than by the size of an object (which is variable,
-and could grow infinitely large).
-
-In principle this sounds nice, but there are a few issues:
-
- * We have pre-built constants (PBCs), which don't live in our contiguous heap.
-   Since the write barrier is supposed to be very cheap, we don't want it to
-   need to special case PBCs.  There are two proposed solutions:
-   * Copy all PBCs to the heap on startup.  This requires double the memory for
-     PBCs.  (According to fijal, this is what the JVM does).
-   * Don't pre-build PBCs, instead allocate them fresh on startup.  The OO type
-     backends do this, and it results in very slow startup time (5-6 seconds),
-     though there are likely other factors involved.
- * Currently the hybrid GC allocates large objects externally to the heap,
-   this causes the same problem as PBCs.
-   * The JVM apparently also allocates large objects externally, and pays a
-     similar price for it.
+The gc should support "lightweight destructors" that just do
+a raw_free().  It should also help because we can then account
+the size of the raw memory attached to objects in this way.
+See e.g. http://codespeak.net/pipermail/pypy-dev/2010q4/006648.html
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.