# Software Transactional Memory "for real"

## Introduction

• This talk is about programming multi- or many-core machines

• Armin Rigo
• "Language implementation guy"

|pause|

• PyPy project
• Python in Python
• includes a Just-in-Time Compiler "Generator" for Python and any other dynamic language

## Motivation

• A single-core program is getting exponentially slower than a multi-core one

|pause|

• Using several processes exchanging data
• works fine in some cases
• but becomes a large mess in others

|pause|

• this talk!

## Synchronization with locks

|pause|

• How do you know if you missed a place?
• hard to catch by writing tests
• instead you get obscure rare run-time crashes

|pause|

• Issues when scaling to a large program
• order of acquisition

## Synchronization with TM

• TM = Transactional Memory

|pause|

----------------   --------------------
Locks              Transactional Memory
----------------   --------------------

mylock.acquire();    atomic {
x = list1.pop();       x = list1.pop();
list2.append(x);       list2.append(x);
mylock.release();    }


• Locks
• TM

## Locks versus TM

• Locks
• TM in case of conflict

## Synchronization with TM

• "Optimistic" approach:
• no lock to protect shared data in memory
• instead, track all memory accesses
• detect actual conflicts
• if conflict, restart the whole "transaction"

|pause|

• Easier to use
• no need to name locks
• "composability"

## HTM versus STM

• HTM = Hardware Transactional Memory
• Intel Haswell CPU, 2013
• and others

|pause|

• STM = Software Transactional Memory
• various approaches
• large overhead (2x-10x), but getting faster
• experimental in PyPy: read/write barriers, as with GC

## The catch

|pause|

• You Still Have To Use Threads

|pause|

• Threads are hard to debug, non-reproductible

|pause|

• TM does not solve this problem:
• How do you know if you missed a place to put atomic around?
• hard to catch by writing tests
• instead you get obscure rare run-time crashes

|pause|

• What if we put atomic everywhere?

## Analogy with Garbage Collection

• Explicit Memory Management:
• messy, hard to debug rare leaks or corruptions

|pause|

• Automatic GC solves it
• common languages either have a GC or not
• if they have a GC, it controls almost all objects
• not just a small part of them

## Proposed solution

• Put atomic everywhere...
• in other words, Run Everything with TM

## Proposed solution

• Really needs TM. With locks, you'd get this:
• With TM you can get this:

## In a few words

• Longer transactions
• Corresponding to larger parts of the program
• The underlying multi-threaded model becomes implicit

## Typical example

• You want to run f1() and f2() and f3()

|pause|

• Assume they are "mostly independent"
• i.e. we expect that we can run them in parallel
• but we cannot prove it, we just hope that in the common case we can

|pause|

• In case of conflicts, we don't want random behavior
• i.e. we don't want thread-like non-determinism and crashes

## Pooling and atomic statements

• Solution: use a library that creates a pool of threads
• Each thread picks a function from the list and runs it with atomic

## Results

• The behavior is "as if" we had run f1(), f2() and f3() sequentially
• The programmer chooses if he wants this fixed order, or if any order is fine
• Threads are hidden from the programmer

## More generally

• This was an example only
• TM gives various new ways to hide threads under a nice interface

## Not the Ultimate Solution

• Much easier for the programmer to get reproducible results
• But maybe too many conflicts

|pause|

• "The right side" of the problem