Commits

Maciej Fijalkowski committed ebef81c Merge

merge

Comments (0)

Files changed (1)

talk/stm2012/stmimpl.rst

 STM implementation model
 ========================
 
+
 Overview
---------
+============================================================
 
 Objects are either global (visible to everybody, and read-only), or
 they are local (visible only to the current thread).
 compare-and-exchange operation.
 
 
+
+The STM implementation
+============================================================
+
+
 Object header
 -------------
 
 - h_global (boolean)
 - h_possibly_outdated (boolean)
 - h_written (boolean)
+- h_local_copy (boolean)
 - h_revision (unsigned integer)
 
 The h_revision is an unsigned "revision number" that can also
   version in any transaction.
 
 - ``h_written`` is set on local objects that have been written to.
+  It is false on global objects.
+
+- ``h_local_copy`` is set on local objects that are a copy of a global
+  object.  It is undefined on global objects.
 
 - ``h_revision`` on local objects points to the global object that they
-  come from, if any; otherwise it is 1.
+  come from, if any (i.e. if ``h_local_copy``); otherwise it is
+  undefined (and can be used for other purposes, e.g. by the GC).
 
 - ``h_revision`` on global objects depends on whether the object is the
   head of the chained list of revisions or not.  If it is, then
 the transaction it has so far.  The following data is transaction-local:
 
 - start_time
+- is_inevitable
 - global_to_local
 - list_of_read_objects
 - recent_reads_cache
 the state at time ``start_time``.  The global "time" is a single global
 number that is atomically incremented whenever a transaction commits.
 
+``is_inevitable`` is a flag described later.
+
 ``global_to_local`` is a dictionary-like mapping of global objects to
 their corresponding local objects.
 
 
 - All barriers ensure that ``global_to_local`` satisfies the following
   property for any local object ``L``: either ``L`` was created by
-  this transaction (``L->h_revision == 1``) or else satisfies
+  this transaction (``L->h_revision`` is undefined) or else satisfies
   ``global_to_local[L->h_revision] == L``.
 
 
         W->h_global = False
         W->h_possibly_outdated = False
         W->h_written = True
-        W->h_revision = 0
+        W->h_local_copy = False
+        #W->h_revision can be left uninitialized
         return W
 
 
         while not (v := R->h_revision) & 1:# "is a pointer", i.e.
             R = v                          #   "has a more recent revision"
         if v > start_time:                 # object too recent?
+            if V >= LOCKED:                # object actually locked?
+                goto retry                 # spin-loop to start of func
             ValidateDuringTransaction()    # try to move start_time forward
-            return LatestGlobalRevision(R) # restart searching from R
+            goto retry                     # restart searching from R
         PossiblyUpdateChain(G, R, ...)     # see below
         return R
 
 
 
 ``L = Localize(R)`` is an operation that takes a read-ready pointer to a
-global object and returns a corresponding pointer to a local object::
+*global* object and returns a corresponding pointer to a local object::
 
     def Localize(R):
+        assert R->h_global
         if R in global_to_local:
             return global_to_local[R]
         L = malloc(sizeof R)
         L->h_global = False
         L->h_possibly_outdated = False
         L->h_written = False
+        L->h_local_copy = True
         L->h_revision = R          # back-reference to the original
         L->objectbody... = R->objectbody...
         global_to_local[R] = L
         return L
 
+    def LocalizeReadReady(R):
+        if R->h_global:
+            L = Localize(R)
+        else:
+            L = R
+        return L
+
 
 ``W = WriteBarrier(P)`` and ``W = WriteBarrierFromReadReady(R)`` are
 two versions of the write barrier::
 to the latest version::
 
     def PossiblyUpdateChain(G, R, R_Container, FieldName):
-        if R != G:
+        if R != G and Rarely():
             # compress the chain
             while G->h_revision != R:
                 G_next = G->h_revision
 the modified values.  It works because the original and each modified
 value are all interchangeable as far as correctness goes.
 
+``Rarely`` uses a thread-local counter to return True only rarely.  We
+do the above update only rarely, rather than always, although it would
+naively seem that doing the update always is a good idea.  The problem
+is that it generates a lot of write traffic to global data that is
+potentially shared between CPUs.  We will need more measurements, but it
+seems that doing it too often causes CPUs to stall.  It is probable that
+updates done by one CPU are sent to other CPUs at high cost, even though
+these updates are not so important in this particular case (i.e. the
+program would work fine if the other CPUs didn't see such updates at all
+and instead repeated the same update logic locally).
+
 
 Validation
 ------------------------------------
 
 ``ValidateDuringTransaction`` is called during a transaction to update
 ``start_time``.  It makes sure that none of the read objects have been
-modified since ``start_time``::
+modified since ``start_time``.  If one of these objects is modified by
+another commit in parallel, then we want this transaction to eventually
+fail.  More precisely, it will fail the next time one of the
+``ValidateDuring*`` functions is called.
+
+Note a subtle point: if an object is currently locked, we have to wait
+until it gets unlocked, because it might turn out to point to a more
+recent version that is still older than the current global time.
+
+Here is ``ValidateDuringTransaction``::
 
     def ValidateDuringTransaction():
-        start_time = global_cur_time    # copy from the global time
+        start_time = GetGlobalCurTime() # copy from the global time
         for R in list_of_read_objects:
-            if not (R->h_revision & 1): # "is a pointer", i.e.
+            v = R->h_revision
+            if not (v & 1):             # "is a pointer", i.e.
                 AbortTransaction()      #   "has a more recent revision"
-
-If such an object is modified by another commit, then this transaction
-will eventually fail --- hopefully, the next time
-``ValidateDuringTransaction`` is called.
+            if v >= LOCKED:             # locked
+                spin loop retry         # jump back to the "v = ..." line
 
 The last detection for inconsistency is during commit, when
 ``ValidateDuringCommit`` is called.  It is a slightly more complex
 version than ``ValidateDuringTransaction`` because it has to handle
-"locks" correctly::
+"locks" correctly.  It also returns a True/False result instead of
+aborting::
 
     def ValidateDuringCommit():
         for R in list_of_read_objects:
             v = R->h_revision
             if not (v & 1):            # "is a pointer", i.e.
-                AbortTransaction()     #   "has a more recent revision"
+                return False           #   "has a more recent revision"
             if v >= LOCKED:            # locked
                 if v != my_lock:       # and not by me
-                    spin loop retry OR # jump back to the "v = ..." line
-                    AbortTransaction() # ...or just abort
-
-The choice of waiting or aborting when encountering a read of a locked
-object needs to be done carefully to avoid deadlocks.  Always aborting
-would be correct, but a bit too restrictive.  Always entering a spin
-loop could lead to deadlocks with two transactions that each locked
-objects from the other's ``list_of_read_objects``.  So for the purposes
-of this explanation we will always assume that it aborts.
+                    return False
+        return True
 
 
 Local garbage collection
 
 Hand-wavy pseudo-code::
 
-    def TransactionEnd():
+    def FinishTransaction():
         FindRootsForLocalCollect()
         PerformLocalCollect()
-        TransactionCommit()          # see below
+        CommitTransaction()          # see below
 
     def FindRootsForLocalCollect():
         for (R, L) in global_to_local:
         collect from the roots...
         for all reached local object,
             change h_global False->True
-            and h_written True->False
+            change h_written True->False
+            if not h_local_copy:
+                h_revision = 1
+
+Note that non-written local objects are just shadow copies of existing
+global objects.  For the sequel we just replace them with the original
+global objects again.  This is done by tweaking the local objects'
+header.
+
+Note also that ``h_revision`` is free to be (ab)used on newly allocated
+objects (the GC of PyPy does this), but it should be set to 1 just
+before calling ``CommitTransaction``.
 
 
 Committing
 In pseudo-code::
 
     def CommitTransaction():
+        # (see below for the full version with inevitable transactions)
         AcquireLocks()
         cur_time = global_cur_time
         while not CMPXCHG(&global_cur_time, cur_time, cur_time + 2):
             cur_time = global_cur_time    # try again
-        ValidateDuringCommit()
+        if cur_time != start_time:
+            if not ValidateDuringCommit():   # only call it if needed
+                AbortTransaction()           # last abort point
         UpdateChainHeads(cur_time)
 
 Note the general style of usage of CMPXCHG: we first read normally the
 ``h_revision`` field; it does not involve OS-specific thread locks::
 
     def AcquireLocks():
-        for (R, L, 0) in gcroots:
+        for (R, L, 0) in gcroots SORTED BY R:
             v = R->h_revision
             if not (v & 1):         # "is a pointer", i.e.
                 AbortTransaction()  #   "has a more recent revision"
             if v >= LOCKED:         # already locked by someone else
-                spin loop retry OR  # jump back to the "v = ..." line
-                AbortTransaction()
+                spin loop retry     # jump back to the "v = ..." line
             if not CMPXCHG(&R->h_revision, v, my_lock):
                 spin loop retry     # jump back to the "v = ..." line
             save v into the third item in gcroots, replacing the 0
 
-(Note that for non-written local objects, we skip this locking entirely;
-instead, we turn the object into a "global but outdated" object, keeping
-the same ``h_revision`` but with a different meaning.)
-
 We use CMPXCHG to store the lock.  This is required, because we must not
 conflict with another CPU that would try to write its own lock in the
-same field --- in that case, only one CPU can succeed.  The order of
-enumeration of ``global_to_local`` must be the same one --- for example,
-following the numeric order of ``R``.  This is needed to avoid
-deadlocks.  Alternatively we could consider this case rare, and abort
-instead of waiting.
+same field --- in that case, only one CPU can succeed.
+
+Acquiring multiple locks comes with the question of how to avoid
+deadlocks.  In this case, it is prevented by ordering the lock
+acquisitions in the numeric order of the R pointers.  This should be
+enough to prevent deadlocks even if two threads have several objects in
+common in their gcroots.
 
 The lock's value ``my_lock`` is, precisely, a very large odd number, at
-least LOCKED (which should be some value like 0xFFFF0000).  As we can
-check, this is enough to cause ``LatestGlobalRevision`` to spin loop,
-calling ``ValidateDuringTransaction`` over and over again, until the
+least LOCKED (which should be some value like 0xFFFF0000).
+Such a value causes ``LatestGlobalRevision`` to spin loop until the
 lock is released (i.e.  another value is written in ``h_revision``).
 
 
 done by writing back the original timestamps in the ``h_revision``
 fields::
 
-    def AbortTransaction():
+    def CancelLocks():
         for (R, L, v) in gcroots:
             if v != 0:
                 R->h_revision = v
+                reset the entry in gcroots to v=0
+
+    def AbortTransaction():
+        CancelLocks()
         # call longjmp(), which is the function from C
         # going back to the transaction start
         longjmp()
 ``LatestGlobalRevision``, in the loop.  It was omitted here because this
 is always a no-op (i.e. the CPUs always provide this effect for us), not
 only on x86 but on all modern CPUs.
+
+
+Inevitable transactions
+------------------------------------
+
+A transaction is "inevitable" when it cannot abort any more.  It occurs
+typically when the transaction tries to do I/O or a similar effect that
+we cannot roll back.  Such effects are O.K., but they mean that we have
+to guarantee the transaction's eventual successful commit.
+
+The main restriction is that there can be only one inevitable
+transaction at a time.  Right now the model doesn't allow any other
+transaction to start or commit when there is an inevitable transaction;
+this restriction could be lifted with additional work.
+
+For now, the hint that the system has currently got an inevitable
+transaction running is given by the value stored in ``global_cur_time``:
+the largest positive number (equal to the ``INEVITABLE`` constant).
+
+``BecomeInevitable`` is called from the middle of a transaction to
+(attempt to) make the current transaction inevitable::
+
+    def BecomeInevitable():
+        inevitable_mutex.acquire()
+        cur_time = global_cur_time
+        while not CMPXCHG(&global_cur_time, cur_time, INEVITABLE):
+            cur_time = global_cur_time    # try again
+        if start_time != cur_time:
+            start_time = cur_time
+            if not ValidateDuringCommit():
+                global_cur_time = cur_time     # must restore the value
+                inevitable_mutex.release()
+                AbortTransaction()
+        is_inevitable = True
+
+We use a normal OS mutex to allow other threads to really sleep instead
+of spin-looping until the inevitable transaction finishes.  So the
+function ``GetGlobalCurTime`` is defined to return ``global_cur_time``
+after waiting for other inevitable transaction to finish::
+    
+    def GetGlobalCurTime():
+        assert not is_inevitable    # must not be myself inevitable
+        t = global_cur_time
+        if t == INEVITABLE:         # there is another inevitable tr.?
+            inevitable_mutex.acquire()   # wait
+            inevitable_mutex.release()
+            return GetGlobalCurTime()    # retry
+        return t
+
+Then we extend ``CommitTransaction`` for inevitable support::
+
+    def CommitTransaction():
+        AcquireLocks()
+        if is_inevitable:
+            cur_time = start_time
+            if not CMPXCHG(&global_cur_time, INEVITABLE, cur_time + 2):
+                unreachable: no other thread changed global_cur_time
+            inevitable_mutex.release()
+        else:
+            cur_time = GetGlobalCurTimeInCommit()
+            while not CMPXCHG(&global_cur_time, cur_time, cur_time + 2):
+                cur_time = GetGlobalCurTimeInCommit()  # try again
+            if cur_time != start_time:
+                if not ValidateDuringCommit():   # only call it if needed
+                    AbortTransaction()           # last abort point
+        UpdateChainHeads(cur_time)
+
+    def GetGlobalCurTimeInCommit():
+        t = global_cur_time
+        if t == INEVITABLE:
+            CancelLocks()
+            inevitable_mutex.acquire()   # wait until released
+            inevitable_mutex.release()
+            AcquireLocks()
+            return GetGlobalCurTimeInCommit()
+        return t
+
+
+
+Barrier placement in the source code
+============================================================
+
+
+Overview
+-----------
+
+Placing the read/write barriers in the source code is not necessarily
+straightforward, because there are a lot of object states to choose
+from.  The barriers described above are just the most common cases.
+
+We classify here the object categories more precisely.  A pointer to an
+object in the category ``R`` might actually point to one that is in the
+more precise category ``L`` or ``W``.  Conversely, a pointer to an
+object in the category ``L`` is also always in the categories ``R`` or
+``O``.  This can be seen more generally in the implication
+relationships::
+
+     W => L => R => O => P       G => P    (I)
+
+A letter X is called *more general than* a letter Y if ``Y => X``, and
+*more precise than* a letter Y if ``X => Y``.
+
+Barriers are used to make an object's category more precise.  Here are
+all 12 interesting conversions, with the five functions from the section
+`Read/write barriers design`_ (abbreviated as DRB, RRB, LRR, WrB and
+WFR) as well as seven more potential conversions (written ``*``) that
+could be implemented efficiently with slight variations:
+
+    +--------+-----------------------------------+
+    |        |                From               |
+    +--------+-----+-----+-----+-----+-----+-----+
+    |   To   |  P  |  G  |  O  |  R  |  L  |  W  |
+    +========+=====+=====+=====+=====+=====+=====+
+    |     R  | DRB |``*``| RRB |                 |
+    +--------+-----+-----+-----+-----+-----------+
+    |     L  |``*``|``*``|``*``| LRR |           |
+    +--------+-----+-----+-----+-----+-----+-----+
+    |     W  | WrB |``*``|``*``| WFR |``*``|     |
+    +--------+-----+-----+-----+-----+-----+-----+
+
+In the sequel we will refer to each of the 12 variations as *X2Y*
+for X in ``P, G, O, R, L`` and Y in ``R, L, W``.
+
+
+Constraints
+-----------
+
+The source code's pointer variables are each assigned one letter
+from ``P, G, O, R, L, W`` such that:
+
+* A variable is only passed into another variable with either the same
+  or a more general letter.  This holds for intra- as well as
+  inter-procedural definitions of "being passed" (i.e. also for
+  arguments and return value).
+
+* Read/write barriers can be inserted at any point, returning a variable
+  of a more precise letter.
+
+* Any read must be done on an object in category ``R, L, W``.  Any write
+  must be done on an object in category ``W``.  Moreover an object must
+  only be in category ``W`` if we can prove that a write necessarily
+  occurs on the object.
+
+* The ``L2W`` barrier is very cheap.  It is also the only barrier which
+  doesn't need to return a potentially different pointer.  However,
+  converting objects to the ``L`` category in first place (rather than
+  ``R``) has a cost.  It should be done only for the objects on which we
+  are *likely* to perform a write.
+
+* An object in the ``R`` category falls back automatically to the ``O``
+  category if we perform an operation (like a call to an unrelated
+  function) that might potentially cause it to be written to.
+
+* If we do a call that might cause the current transaction to end and
+  the next one to start, then all live variables fall back to the ``P``
+  category.
+
+* The ``G`` category is only used by prebuilt constants.  In all
+  other cases we don't know that a pointer is definitely not a local
+  pointer.  The ``NULL`` constant is in all categories; ``G`` and ``L``
+  have only ``NULL`` in common.
+
+* In general, it is useful to minimize the number of executed barriers,
+  and have the cheapest barriers possible.  If, for example, we have a
+  control flow graph with two paths that reach (unconditionally) the
+  same write location, but on one path the object is a ``R`` (because we
+  just read something out of it) and on the other path the object is a
+  ``G`` (because it is a global on which we did not perform any read),
+  then we should insert the ``R2W`` barrier at the end of the first path
+  and the ``G2W`` barrier at the end of the second path, rather than the
+  ``P2W`` barrier only once after the control flow merges.
+
+Pseudo-code for some of the remaining barriers::
+
+    def G2R(G):
+        assert G->h_global
+        return P2R(G)        # the fast-path never works
+
+    def G2W(G):
+        assert G->h_global
+        assert not G->h_written
+        if G->h_possibly_outdated:
+            R = LatestGlobalRevision(G)
+        else:
+            R = G
+        W = Localize(R)
+        W->h_written = True
+        R->h_possibly_outdated = True
+        return W
+
+    def L2W(L):
+        if L->h_written:    # fast-path
+            return L
+        L->h_written = True
+        L->h_revision->h_possibly_outdated = True
+        return L
+
+Pointer equality: a comparison ``P1 == P2`` needs special care, because
+there are several physical pointers corresponding logically to the same
+object.  If ``P1`` or ``P2`` is the constant ``NULL`` then no special
+treatment is needed.  Likewise if ``P1`` and ``P2`` are both known to be
+local.  Otherwise, we need in general the following code (which could be
+specialized as well if needed)::
+
+    def PtrEq(P1, P2):
+        return GlobalizeForComparison(P1) == GlobalizeForComparison(P2)
+
+    def GlobalizeForComparison(P):
+        if P == NULL:
+            return NULL
+        elif P->h_global:
+            return LatestGlobalRevision(P)
+        elif P->h_local_copy:
+            return P->h_revision  # return the original global obj
+        else:
+            return P