Chapter 19: Concurrency - Chapter Draft
Supporting concurrency is increasingly important. In the past, mainstream concurrent programming generally meant ensuring that the code interacting with relatively slow network, disk, database, and other I/O resources did not unduly slow things down. Exploiting parallelism was typically only seen in such domains as scientific computing with the apps running on supercomputers.
But there are new factors at work now. The semiconductor industry continues to work feverishly to uphold Moore's Law of exponential increase in chip density. Chip designers used to apply this bounty to speeding up an individual CPU. But for a variety of reasons, this old approach no longer works as well. So now chip designers are cramming chips with more CPUs and hardware threads. Speeding up execution means harnessing the parallelism of the hardware. And it is now our job as software developers to do that work.
The Java platform can help out here. The Java platform is arguably the most robust environment for running concurrent code today, and this functionality can be readily be used from Jython. The problem remains that writing concurrent code is still not easy. This difficulty is especially true with respect to a concurrency model based on threads, which is what today's hardware natively exposes.
This means we have to be concerned with thread safety, which arises as an issue because of the existence of mutable objects that are shared between threads. (Mutable state might be avoidable in functional programming, but it would be hard to avoid in any but the most trivial Python code.) And if you attempt to solve concurrency issues through synchronization, you run into other problems. Besides the potential performance hit, there are opportunities for deadlock and livelock.
Implementations of the JVM, like HotSpot, can often avoid the overhead of synchronization. We will discuss what's necessary for this scenario to happen later in this chapter.
Given all of these issues, it has been argued that threading is just too difficult to get right. We're not convinced by that argument. Useful concurrent systems have been written on the JVM, and that includes apps written in Jython. Key success factors in writing such code include:
- Keep the concurrency simple.
- Use tasks, which can be mapped to a thread pool.
- Use immutable objects where possible.
- Avoid unnecessary sharing of mutable objects.
- Minimize sharing of mutable objects. Queues and related objects -- like synchronization barriers -- provide a structured mechanism to hand over objects between threads. This can enable a design where an object is visible to only one thread when its state changes.
- Code defensively. Make it possible to cancel or interrupt tasks. Use timeouts.
Java or Python APIs?
One issue that you will have to consider in writing concurrent code is how much to make your implementation dependent on the Java platform. Here are our recommendations:
- If you are porting an existing Python code base that uses concurrency, you can just use the standard Python threading module. Such code can still interoperate with Java, because Jython threads are always mapped to Java threads. (If you are coming from Java, you will recognize this API, since it is substantially based on Java's.)
- Jython implements dict and set by using Java's ConcurrentHashMap. This means you can just use these standard Python types, and still get high performance concurrency. (They are also atomic like in CPython, as we will describe.)
- You can also any use of the collections from java.util.concurrent. So if it fits your app's needs, you may want to consider using such collections as CopyOnWriteArrayList and ConcurrentSkipListMap (new in Java 6). The Google Collections Library is another good choice that works well with Jython.
- Use higher-level primitives from Java instead of creating your own. This is particular true of the executor services for running and managing tasks against thread pools. So for example, avoid using threading.Timer, because you can use timed execution services in its place. But still use threading.Condition and threading.Lock. In particular, these constructs have been optimized to work in the context of a with-statement, as we will discuss.
In practice, using Java's support for higher level primitives should not impact the portability of your code so much. Using tasks in particular tends to keep all of this well-isolated. And such thread safety considerations as thread confinement and safe publication remain the same.
Lastly, remember you can always mix and match.
Working with Threads
Creating threads is easy, perhaps too easy. This example downloads a web page concurrently:
.. literalinclude:: src/chapter19/test_thread_creation.py
Be careful not to inadvertently invoke the function; target takes a reference to the function object (typically a name if a normal function). Calling the function instead creates an amusing bug where your target function runs now, so everything looks fine at first. But no concurrency is happening, because the function call is actually being run by the invoking thread, not this new thread.
The target function can be a regular function, or an object that is callable (implements __call__). This later case can make it harder to see that the target is a function object!
To wait for a thread to complete, call join on it. This enables working with the concurrent result. The only problem is getting the result. As we will see, publishing results into variables is safe in Jython, but it's not the nicest way either.
The threading.local class enables a letting each thread have its own instances of some objects in an otherwise shared environment. Its usage is deceptively simple. Simply create an instance of threading.local, or a subclass, and assign it to a variable or other name. This variable could be global, or part of some other namespace. So far, this is just like working with any other object in Python.
Threads then can share the variable, but with a twist: each thread will see a different, thread-specific version of the object. This object can have arbitrary attributes added to it, each of which will not be visible to other threads.
Other options include subclassing threading.local. As usual, this allows you to define defaults and specify a more nuanced properties model. But one unique, and potentially useful, aspect is that any attributes specified in __slots__ will be shared across threads.
However, there's a big problem when working with thread locals. Usually they don't make sense because threads are not the right scope. An object or a function is, especially through a closure. If you are using thread locals, you are implicitly adopting a model where threads are partitioning the work. But then you are binding the given piece of work to a thread. This makes using a thread pool problematic, because you have to clean up after the thread.
Having said they, thread locals might be useful in certain cases. One common scenario is one where your code is being called by a component that you didn't write. You may need to access a thread-local singleton. And of course, if you are using code whose architecture mandates thread locals, it's just something you will have to work with.
But often this is unnecessary. Your code may be different, but Python gives you good tools to avoid action at a distance. You can use closures, decorators, even sometimes selectively monkey patching modules. Take advantage of the fact that Python is a dynamic language, with strong support for metaprogramming. And remember that the Jython implementation makes these techniques accessible when working with even recalcitrant Java code.
In the end, thread locals are an interesting aside. They do not work well in a task-oriented model, because you don't want to associate context with a worker thread that will be assigned to arbitrary tasks. Without a lot of care, this can make for a confused mess.
No Global Interpreter Lock
Jython lacks the global interpreter lock (GIL), which is an implementation detail of CPython. For CPython, the GIL means that only one thread at a time can run Python code. This restriction also applies to much of the supporting runtime as well as extension modules that do not release the GIL. (Unfortunately development efforts to remove the GIL in CPython have so far only had the effect of slowing down Python execution significantly.)
The impact of the GIL on CPython programming is that threads are not as useful as they are in Jython. Concurrency will only be seen in interacting with I/O as well as scenarios where computation is performed by an extension module on data structures managed outside of CPython's runtime. Instead, developers typically will use a process-oriented model to evade the restrictiveness of the GIL.
Again, Jython does not have the straightjacket of the GIL. This is because all Python threads are mapped to Java threads and use standard Java garbage collection support (the main reason for the GIL in CPython is because of the reference counting GC system). The important ramification here is that you can use threads for compute-intensive tasks that are written in Python.
Module Import Lock
Python does, however, define a module import lock, which is implemented by Jython. This lock is acquired whenever an import of any name is made. This is true whether the import goes through the import statement, the equivalent __import__ builtin, or related code. It's important to note that even if the corresponding module has already been imported, the module import lock will still be acquired, if only briefly.
So don't write code like this in a hot loop, especially in threaded code:
def slow_things_way_down(): from foo import bar, baz ...
It may still make sense to defer your imports. Such deferral can decrease the start time of your app. Just keep in mind that thread(s) performing such imports will be forced to run single threaded because of this lock. So it might make sense for your code to perform deferred imports in a background thread:
.. literalinclude:: src/chapter19/background_import.py
So as you can see, you need to do at least two imports of the a given module; one in the background thread; the other in the actual place(s) where the module's namespace is being used.
Here's why we need the module import lock. Upon the first import, the import procedure runs the (implicit) top-level function of the module. Even though many modules are often declarative in nature, in Python all definitions are done at runtime. Such definitions potentially include further imports (recursive imports). And the top-level function can certainly perform much more complex tasks. The module import lock simplifies this setup so that it's safely published. We will discuss this concept further later in this chapter.
Note that in the current implementation, the module import lock is global for the entire Jython runtime. This may change in the future.
Working with Tasks
It's usually best to avoid managing the lifecycle of threads directly. Instead, the task model often provides a better abstraction.
Tasks describe the asynchronous computation to be performed. Although there are other options, the object you submit to be executed should implement Java's Callable interface (a call method without arguments), as this best maps into working with a Python method or function. Tasks move through the states of being created, submitted (to an executor), started, and completed. Tasks can also be cancelled or interrupted.
Executors run tasks using a set of threads. This might be one thread, a thread pool, or as many threads as necessary to run all currently submitted tasks concurrently. The specific choice comprises the executor policy. But generally you want to use a thread pool so as to control the degree of concurrency.
Futures allow code to access the result of a computation -- or an exception, if thrown -- in a task only at the point when it's needed. Up until that point, the using code can run concurrently with that task. If it's not ready, a wait-on dependency is introduced.
We are going to look at how we can use this functionality by using the example of downloading web pages. We will wrap this up so it's easy to work with, tracking the state of the download, as well as any timing information:
.. literalinclude:: src/chapter19/downloader.py
In Jython any other task could be done in this fashion, whether it is a database query or a computationally intensive task written in Python. It just needs to support the Callable interface.
Next, we need to create the futures. Upon completion of a future, either the result is returned, or an exception is thrown into the caller. This exception will be one of:
- ExecutionException. Your code can retrieve the underlying exception with the cause attribute.
(This pushing of the exception into the asynchronous caller is thus similar to how a coroutine works when send is called on it.)
Now we have what we need to multiplex the downloads of several web pages over a thread pool:
.. literalinclude:: src/chapter19/test_futures.py
Up until the get method on the returned future, the caller run concurrently with this task. The get call then introduces a wait-on dependency on the task's completion. (So this is like calling join on the supporting thread.)
Shutting down a thread pool should be as simple as calling the shutdown method on the pool. However, you may need to take in account this shutdown can happen during extraordinary times in your code. Here's the Jython version of a robust shutdown function, shutdown_and_await_termination, as provided in the standard Java docs:
.. literalinclude:: src/chapter19/shutdown.py
The CompletionService interface provides a nice abstraction to working with futures. The scenario is that instead of waiting for all the futures to complete, as our code did with invokeAll, or otherwise polling them, the completion service will push futures as they are completed onto a synchronized queue. This queue can then be consumed, by consumers running in one or more threads:
.. literalinclude:: src/chapter19/test_completion.py
This setup enables a natural flow. Although it may be tempting to then schedule everything through the completion service's queue, there are limits. For example, if you're writing a scalable web spider, you would want to externalize this work queue. But for simple manangement, it would certainly suffice.
Thread safety addresses such questions as:
- Can the (unintended) interaction of two or more threads corrupt a mutable object? This is especially dangerous for a collection like a list or a dictionary, because such corruption could potentially render the underlying data structure unusable or even produce infinite loops when traversing it.
- Can an update get lost? Perhaps the canonical example is incrementing a counter. In this case, there can be a data race with another thread in the time between retrieving the current value, and then updating with the incremented value.
Jython ensures that its underlying mutable collection types -- dict, list, and set -- cannot be corrupted. But updates still might get lost in a data race.
However, other Java collection objects that your code might use would typically not have such no-corruption guarantees. If you need to use LinkedHashMap, so as to support an ordered dictionary, you will need to consider thread safety if it will be both shared and mutated.
Here's a simple test harness we will use in our examples. ThreadSafetyTestCase subclasses unittest.TestCase, adding a new method assertContended:
.. literalinclude:: src/chapter19/threadsafety.py
This new method runs a target function and asserts that all threads properly terminate. Then the testing code needs to check for any other invariants.
For example, we use this idea in Jython to test that certain operations on the list type are atomic. The idea is to to apply a sequence of operations that perform an operation, then reverse it. One step forward, one step back. The net result should be right where you started, an empty list, which is what the test code asserts:
.. literalinclude:: src/chapter19/test_list.py
Of course these concerns do not apply at all to immutable objects. Commonly used objects like strings, numbers, datetimes, tuples, and frozen sets are immutable. And you can create your own immutable objects too.
There are a number of other strategies in solving thread safety issues. We will look at them as follows:
- Thread Confinement
- Safe Publication
We use synchronization to control the entry of threads into code blocks corresponding to synchronized resources. Through this control we can prevent data races, assuming a correct synchronization protocol. (This can be a big assumption!)
A threading.Lock ensures entry by only one thread. (In Jython, but unlike CPython, such locks are always reentrant; there's no distinction between threading.Lock and threading.RLock.) Other threads have to wait until that thread exits the lock. Such explicit locks are the simplest and perhaps most portable synchronization to perform.
You should generally manage the entry and exit of such locks through a with-statement; failing that, you must use a try-finally to ensure that the lock is always released when exiting a block of code.
Here's some example code using the with-statement. The code allocates a lock, then shares it amongst some tasks:
.. literalinclude:: src/chapter19/test_lock.py :pyobject: LockTestCase.test_with_lock
Alternatively, you can do this with try-finally:
.. literalinclude:: src/chapter19/test_lock.py :pyobject: LockTestCase.test_try_finally_lock
But don't do this. It's actually slower than the with-statement. And using the with-statement version also results in more idiomatic Python code.
Another possibility is to use the synchronize module, which is specific to Jython. This module provides a``make_synchronized`` decorator function, which wraps any callable in Jython in a synchronized block:
.. literalinclude:: src/chapter19/test_synchronized.py
In this case, you don't need to explicitly release anything. Even in the the case of an exception, the synchronization lock is always released upon exit from the function. Again, this version is also slower than the with-statement form, and it doesn't use explicit locks.
Jython's current runtime (as of 2.5.1) can execute the with-statement form more efficiently through both runtime support and how this statement is compiled. The reason is that most JVMs can perform analysis on a chunk of code (the compilation unit, including any inlining) to avoid synchronization overhead, so long as two conditions are met. First, the chunk contains both the lock and unlock. And second, the chunk is not too long for the JVM to perform its analysis. The with-statement's semantics make it relatively easy for us to do that when working with built-in types like threading.Lock, while avoiding the overhead of Java runtime reflection.
In the future, support of the new invokedynamic bytecode should collapse these performance differences.
The threading module offers portablity, but it's also minimalist. You may want to use the synchronizers in Java.util.concurrent, instead of their wrapped versions in threading. In particular, this approach is necessary if you want to wait on a lock with a timeout. And you may want to use factories like Collections.synchronizedMap, when applicable, to ensure the underlying Java object has the desired synchronization.
But use synchronizaton carefully. This code will always eventually deadlock:
.. literalinclude:: src/chapter19/deadlock.py
Deadlock results from a cycle of any length of wait-on dependencies. For example, Alice is waiting on Bob, but Bob is waiting on Alice. Without a timeout or other change in strategy -- Alice just gets tired of waiting on Bob! -- this deadlock will not be broken.
Avoiding deadlocks can be done by never acquiring locks such that a cycle like that can be created. If we rewrote the example so that locks are acquired in the same order (Bob always allows Alice to go first), there would be no deadlocks. However, this ordering is not always so easy to do. Often, a more robust strategy is to allow for timeouts.
Other Synchronization Objects
The Queue module implements a first-in, first-out synchronized queue. (Synchronized queues are also called blocking queues, and that's how they are described in java.util.concurrent.) Such queues represent a thread-safe way to send objects from one or more producing threads to one or more consuming threads.
Often, you will define a poision object to shut down the queue. This will allow any consuming, but waiting threads to immediately shut down. Or just use Java's support for executors to get an off-the-shelf solution.
If you need to implement another policy, such as last-in, first-out or based on a priority, you can use the comparable synchronized queues in java.util.concurrent as appropriate. (Note these have since been implemented in Python 2.6, so they will be made available when Jython 2.6 is eventually released.)
Condition objects allow for one thread to notify another thread that's waiting on a condition to wake up; notifyAll is used to wake up all such threads. Along with Queue, this is probably the most versatile of the synchronizing objects for real usage.
Condition objects are always associated with a Lock. Your code needs to bracket waiting and notifying the condition by acquiring the corresponding lock, then finally (as always!) releasing it. As usual, this is easiest done in the context of the with-statement:
.. literalinclude:: src/chapter19/condition.py
For example, here's how we actually implement a Queue in the standard library of Jython (just modified here to use the with-statement). We can't use a standard Java blocking queue, because the requirement of being able to join on the queue when there's no more work to be performed requires a third condition variable:
.. literalinclude:: src/chapter19/Queue.py
There are other mechanisms to synchronize, including exchangers, barriers, latches, etc. You can use semaphores to describe scenarios where it's possible for multiple threads to enter. Or use locks that are set up to distinguish reads from writes. There are many possibilities for the Java platform. In our experience, Jython should be able to work with any of them.
An atomic operation is inherently thread safe. Data races and object corruption do not occur. And it's not possible for other threads to see an inconsistent view.
Atomic operations are therefore simpler to use than synchronization. In addition, atomic operations will often use underlying support in the CPU, such as a compare-and-swap instruction. Or they may use locking too. The important thing to know is that the lock is not directly visible. Also, if synchronization is used, it's not possible to expand the scope of the synchronization. In particular, callbacks and iteration are not feasible.
Python guarantees the atomicity of certain operations, although at best it's only informally documented. Fredrik Lundh's article on "Thread Synchronization Methods in Python" summarizes the mailing list dicussions and the state of the CPython implementation. Quoting his article, the following are atomic operations for Python code:
- Reading or replacing a single instance attribute
- Reading or replacing a single global variable
- Fetching an item from a list
- Modifying a list in place (e.g. adding an item using append)
- Fetching an item from a dictionary
- Modifying a dictionary in place (e.g. adding an item, or calling the clear method)
Although unstated, this also applies to equivalent ops on the builtin set type.
For CPython, this atomicity emerges from combining its Global Interpreter Lock (GIL), the Python bytecode virtual machine execution loop, and the fact that types like dict and list are implemented natively in C and do not release the GIL.
Despite the fact that this is in some sense accidentally emergent, it is a useful simplification for the developer. And it's what existing Python code expects. So this is what we have implemented in Jython.
In particular, because dict is a ConcurrentHashMap, we also expose the following methods to atomically update dictionaries:
* ``setifabsent`` * ``update``
It's important to note that iterations are not atomic, even on a ConcurrentHashMap.
Atomic operations are useful, but they are pretty limited too. Often, you still need to use synchronization to prevent data races. And this has to be done with care to avoid deadlocks and starvation.
Thread confinement is often the best solution to resolve most of the problems seen in working with mutable objects. In practice, you probably don't need to share a large percentage of the mutable objects used in your code. Very simply put, if you don't share, then thread safety issues go away.
Not all problems can be reduced to using thread confinement. There are likely some shared objects in your system, but in practice most can be eliminated. And often the shared state is someone else's problem:
- Intermediate objects don't require sharing. For example, if you are building up a buffer that is only pointed to by a local variable, you don't need to synchronize. It's an easy prescription to follow, so long as you are not trying to keep around these intermediate objects to avoid allocation overhead. Don't do that.
- Producer-consumer. Construct an object in one thread, then hand it off to another thread. You just need to use an appropriate synchronizer object, such as a Queue.
- Application containers. The typical database-driven web applications makes for the classic case. For example, if you are using modjy, then the database connection pools and thread pools are the responsibility of the servlet container. And they are not directly observable. (But don't do things like share database connections across threads.) Caches and databases then are where you will see shared state.
- Actors. The actor model is another good example. Send and receive messages to an actor (effectively an independent thread) and let it manipulate any objects it owns on your behalf. Effectively this reduces the problem to sharing one mutable object, the message queue. The message queue can then ensure any accesses are appropriately serialized, so there are no thread safety issues.
Unfortunately thread confinement is not without issues in Jython. For example, if you use StringIO, you have to pay the cost that this class uses list, which is synchronized. Although it's possible to further optimize the Jython implementation of the Python standard library, if a section of code is hot enough, you may want to consider rewriting that in Java to ensure no additional synchronization overhead.
Lastly, thread confinement is not perfect in Python, because of the possibility of introspecting on frame objects. This means your code can see local variables in other threads, and the objects they point to. But this is really more of an issue for how optimizable Jython is when run on the JVM. It won't cause thread safety issues if you don't exploit this loophole. We will discuss this more in the section on the Python Memory Model.
Python Memory Model
Reasoning about concurrency in Python is easier than in Java. This is because the memory model is not as surprising to our conventional reasoning about how programs operate. However, this also means that Python code sacrifices significant performance to keep it simpler.
Here's why. In order to maximize Java performance, it's allowed for a CPU to arbitrarily re-order the operations performed by Java code, subject to the constraints imposed by happens-before and synchronizes-with relationships. (The published Java memory model goes into more details on these constraints.)
Although such reordering is not visible within a given thread, the problem is that it's visible to other threads. Of course, this visibility only applies to changes made to non-local objects; thread confinement still applies.
In particular, this means you cannot rely on the apparent sequential ordering of Java code when looking at two or more threads.
Python is different. The fundamental thing to know about Python, and what we have implemented in Jython, is that setting any attribute in Python is a volatile write; and getting any attribute is a volatile read. This is because Python attributes are stored in dictionaries, and in Jython, this follows the semantics of the backing ConcurrentHashMap. So get and set are volatile.
So this means that Python code has sequential consistency. Execution follows the ordering of statements in the code. There are no surprises here.
And this means that safe publication is pretty much trivial in Python, when compared to Java. Safe publication means the thread safe association of an object with a name. Because this is always a memory-fenced operation in Python, your code simply needs to ensure that the object itself is built in a thread-safe fashion; then publish it all at once by setting the appropriate variable to this object.
If you need to create module-level objects -- singletons -- then you should do this in the top-level script of the module so that the module import lock is in effect.
Long-threading threads should provide some opportunity for cancellation. The typicl pattern is something like this:
class DoSomething(Runnable): def __init__(self): cancelled = False def run(self): while not self.cancelled: do_stuff()
Remember, Python variables are always volatile, unlike Java. There are no problems with using a cancelled flag like this.
Thread interruption allows for even more responsive cancellation. In particular, if a a thread is waiting on most any synchronizers, such as a condition variable or on file I/O, this action will cause the waited-on method to exit with an InterruptedException. (Unfortunately lock acquistion, except under certain cases such as using lockInterruptibly on the underlying Java lock, is not interruptible.)
Although Python's threading module does not itself support interruption, it is available through the standard Java thread API. First, let's import this class. We will rename it to JThread so it doesn't conflict with Python's version:
from java.lang import Thread as JThread
As we have seen, you can use Java threads as if they are Python threads. So logically you should be able to do the converse: use Python threads as if they are Jave threads. Therefore it would be nice to make calls like JThread.interrupt(obj).
Incidentally, this formulation, instead of obj.interrupt(), looks like a static method on a class, as long as we pass in the object as the first argument. This adaptation is a good use of Python's explicit self.
But there's a problem here. As of the latest released version (Jython 2.5.1), we forgot to include an appropriate __tojava__ method on the Thread class! So this looks like you can't do this trick after all.
Or can you? What if you didn't have to wait until we fix this bug? You could explore the source code -- or look at the class with dir. One possibility would be to use the nominally private _thread attribute on the Thread object. After all _thread is the attribute for the underlying Java thread. Yes, this is an implementation detail, but it's probably fine to use. It's not so likely to change.
But we can do even better. We can monkey patch the Thread class such that it has an appropriate __tojava__ method, but only if it doesn't exist. So this patching is likely to work with a future version of Jython because we are going to fix this missing method before we even consider changing its implementation and removing _thread.
So here's how we can monkey patch, following a recipe of Guido van Rossum:
.. literalinclude:: src/chapter19/monkeypatch.py
This monkeypatch_method decorator allows us to add a method to a class after the fact. (This is what Ruby developers call opening a class.) Use this power with care. But again, you shouldn't worry too much when you keep such fixes to a minimum, especially when it's essentially a bug fix like this one. In our case, we will use a variant, the monkeypatch_method_if_not_set decorator, to ensure we only patch if it has not been fixed by a later version.
Putting it all together, we have this code:
.. literalinclude:: src/chapter19/interrupt.py
(It does rely on the use of threading.Condition to have something to wait on. We will talk about condition variables later.)
Lastly, you could simply access interruption through the cancel method provided by a Future. No need to monkey patch!
Jython can fully take advantage of the underlying Java platform's support for concurrency. You can also use the standard Python threading constructs, which in most cases just wrap the corresponding Java functionality. The standard mutable Python collection types have been implemented in Jython with concurrency in mind. And Python's sequential consistency removes some potential bugs.
But concurrent programming is still not easy to get right, either in Python or in Java. You should consider higher-level concurrency primitives, such as tasks. And you should be disciplined in how your code shares mutable state.