Asynchrony and Threads

We've been planning on exposing async ops with two API's: futures and promises. For example:

// future API
upcxx::future<> put(global_ptr<T>, T*);

// promise API
void put(global_ptr<T>, T*, upcxx::promise<> *done);

I think we're going to need a third flavor that does continuation passing style (callback sent to persona):

// CPS API
template<typename Lambda>
void put(global_ptr<T>, T*, upcxx::persona *listener, Lambda lam);

If you read my text in the progress section of the spec, I explain that a thread persona is the receiver of notifications from upcxx. When a thread advances upcxx::progress, it does so on behalf of a persona, and so the thread will only execute callbacks directed at its active persona(s). Please be sure to read this before continuing here. https://bitbucket.org/berkeleylab/upcxx-spec/src/ff3f941/Progress.tex

Though it might seem like personas are endpoints, this is not my intention. Personas are not (yet) addressable outside of the rank and therefore not targets of rpc's. The callbacks on personas are locally generated to handle the completion of locally initiated operations.

One justification for personas is their implementation: they are internally completion queues specialized to carry only callbacks. Thread to thread queues can be implemented very efficiently, even atomic-free. I have a prototype that I'll be running on KNL soon, fingers crossed. Results were already substantial on Haswell.

The other justification is usability. I think they fit an important use-case very well. Consider the user having some sort of tasking scheduler that is juggling comm events from upcxx as well as local compute completion from worker threads. I see two major configurations:

Dedicated scheduler thread. It is always subscribed to the scheduler's persona so all upcxx events go there. The thread gets at least a dedicated hyperthread, possibly the whole core.
Lock protected scheduler. All cores are workers but also contend to become the scheduler between tasks. Subscription to the scheduler persona floats around protected by the lock.

In either case, we want to enable comm to be initiated from within worker tasks without serialization (for injection), but have its completion event reaped by the scheduler. Futures and promises both fail at making this easy.

First futures, since these are the worst. Futures are created and returned by upcxx calls. Since those calls are happening during the task, the scheduler has no knowledge of the future's existence. So for the scheduler to be able to poll or attach continuations to the future, the worker must put it in the scheduler's state somehow. This implies the worker grabbing the scheduler lock just to put a future in some container.

Promises might be a little better since it would be possible for the user to enumerate and instantiate all needed promises for the phase ahead of time. Then the worker thread would lookup the right promise to use when initiating comm. The scheduler would be listening to the promise's associated futures from the get-go.

Both of these would benefit from our future/promise API's also taking the as an additional argument the persona responsible for signalling those futures and promises. Right now a worker thread's ops would implicitly target the "default persona", which would be something unique to that OS thread. This could have worse attentiveness since events would be resolved by threads that initiated them (instead of one master list) so discovery would only happen at the frequency of task-latency instead of the lesser latency of a some thread entering the scheduler somewhere.

But, even if we add a persona* argument to the future/promise API so that signals can be directed to the scheduler, there's still the nasty issue of futures and promises not being thread-safe. This means the user will have to be mighty careful not to accidentally hang on to a future reference concurrently while transferring it from the worker to the scheduler. With discipline this can be done, but yuck! For this reason, I am strongly opposed to adding persona* argument to future/promise API calls.

If you've been able to follow me, and I understand if not, then you would agree that this is all pretty damning. I do not think futures/promises fit the asymmetric case of comm being initiated and resolved on different threads. Luckily, passing continuations does this just fine. With a void put(..., persona*, lambda), the worker can initiate comm and enlist a callback to be executed by the scheduler when the op completes. That callback can carry whatever application metadata is needed to identify the utility of that comm. For instance, "Box 7's north faces have been delivered". This callback can just assume its executing in the scheduler context, hence no locking. This is clean and high-performance (assuming the underlying queue is).

Comments (1)