Source

extradoc / talk / ep2008 / gc.txt

Full commit

Intro
=====

::

    From arigo to fijal: should we discuss this a bit?  I still don't
    know exactly what this talk is trying to achieve but I would
    personally imagine that to give an intro to GC techniques we need
    pictures with objects as boxes and references as arrows.  Also I
    think we should go for "CPython" slide, a "Jython / IronPython"
    slide, possibly a "IronPython on Mono" slide to introduce Boehm, and
    finally several PyPy slides starting from the simple mark-and-sweep
    and progressing to moving GCs with some more motivation, finally
    reaching generational moving GCs.

    I could imagine preparing this talk in a completely different
    way than with slides with text -- mostly as images.  Some of
    the text you've written below would go attached to the
    corresponding images.

    Still I'm not sure how, or if, we can fill 25 minutes.
    Maybe it's ok to go for a 15-20 minutes talk and give
    people time for questions and a larger break.


* Memory management model

* GC is *not* only the part that takes
  care of circular references

What is refcounting?
====================

* Store a field on each object which is a number
  of references

* Each time you play with it increase refcount

* When refcount goes to 0, delete object

What is a generational moving GC?
=================================

XXX explain, probably few slides, quick:

* Different generations of objects

* Collection runs from roots to all objects

* Young objects are kept in nursery

* Older ones are moved away

* Write barriers, old objects referencing young ones

Difference in performance
=========================

* no mutable fields on objects in generational
 (cache friendly)

* stuff moves around sometimes

* assuming certain usecases (not many old objects referencing
  young ones)

Allocation cost
================

* refcounting/C malloc - expensive

* generational gc - cheap

Allocating large amount of objects
==================================

* CPython - O(n^2) where n is number of objects
  allocated (XXX this is a temporary problem more than
  a deep issue, it might be fixed soon in CPython)

* Any reasonable generational GC should do better

Collection costs
================

* Full collection - costly (always the case with cpython)
  (XXX wrong, CPython uses 3 generations in its cycle finder)

* Nursery collection - quick

Finalizers
==========

* Semantic differences

* __del__ called immediately considered to be
  implementation-dependent

Finalizers - resurrection
=========================

* what happens if your __del__ puts reference back
  to itself?

* in CPython it can be called more than once

* in other implementations, only once

Finalizers - cycles with __del__
================================

* in CPython, they never get collected

* interesting example - a module global with __del__
  (XXX not sure I follow what you mean here.  Python
  was designed so that most objects don't have a
  reference to the module they come from, but just
  a module name, precisely to avoid cycles)

* in PyPy, cycle gets broken in random order

* no hacks like globals clearing

When to call __del__?
=====================

* example open('x', 'w').write('stuff')

* on refcounting, flushes file immediately

* on any other gc, it might be deferred for
  a while

Calling C-level code
====================

* Problems with moving

* Problems with write barriers

Strange operation costs
=======================

* id

XXX what else?