Deploy Atomic Domains

We had some good discussion yesterday. While the template tricks discussed are certainly cool, it is my opinion that their added value may not be enough to justify the departure from idiomatic c++ practices. It does not seem likely for a user to make a mistake in specifying the set of atomic ops capable of contending for the same memory. Algorithms involving atomics require very careful thought. The state space of what each concurrent agent could be doing is monstrous. For their own sanity, a programmer would be wise to keep atomic algorithms textually localized. At least that's how I approach racy code. Would anyone disagree that being the likely case for our users? If we can assume the atomic-op-set will not be wrong, then I think the goofiness of domains-as-types only clutters things up.

This brings us back to specifying the atomic-op-set at runtime. The UPC API makes domain construction collective. @bonachea said that restriction could be lifted. But I think he later contradicted that statement when saying that there may be some state associated with the domain (the mutex for AM-based AMO's lacking a CPU equivalent instruction). If there is such state, then domain construction would have to be collective (or at least named) so that when AM-based atomics land, they have the information necessary to find the state (again, the proper mutex).

My goal here is to plead with gasnet that they not adopt the collective requirement for domain construction since it limits how the upper layer can present the API to the user (for instance, it takes on-the-fly memoization off the table). So long as we agree that a non-collective API has an efficient implementation everywhere there shouldn't be any resistance. Two such implementations could be:

Let the user name the domain, then gasnet would use that name in a local hashtable or something to get the mutex (first occurrence allocates the mutex).
Just statically allocate a bunch of mutexes and hash the memory address of the AMO to determine the mutex. One mutex would work, more would only reduce contention.

Number 2 looks like the winner to me: no names for the user and no per-domain state management for gasnet, how nice! (@bonachea @PHHargrove, I know that both of you are well aware of these strategies. I'm just trying to produce a self-contained post.)

Now that we're hopefully all on board with non-collective and non-named domain construction, I would like to ask for a little bit more, again in the name of flexibility for the layers above. I would like a guarantee that any two domains over the same team and op-set are concurrently compatible. I want this to work:

team = ...;
opset = FETCH_ADD | FETCH_XOR;

// construct equivalent domains (possible that id's compare unequal)
d1 = gasnet_atomic_domain(team, opset);
d2 = gasnet_aotmic_domain(team, opset);

gp = /*some gasnet analog to global pointer*/

// these non-blocking atomics race to access gp along different but compatible domains
handle_t op1 = gasnet_fetch_add_nb(d1, gp);
handle_t op2 = gasnet_fetch_xor_nb(d2, gp);

With a non-collective and non-named domain construction semantics, I think you'll have to admit this is always legal since there would be no way to equate the domains of AMO's originating from different ranks other than comparing the arguments passed to their construction. Please guarantee it!

Now, in my limited imagination, I can only conceive of four possible implementations for the ops of a domain:

All CPU-native: for the case when the domain's team is densely connected via shared memory windows (node local) and all ops are cpu-supported.
All NIC-native: when the team is not node local, but every op in the op-set is NIC supported.
All AM-based over CPU-natives: when the team is not node local but all ops are cpu-supported.
All AM-based with mutexes: otherwise.

Is there conceivable architecture in which you would actually mix these within the same domain? As in fetch_add goes cpu-native, but fetch_xor goes nic-native? That's just silly right? Great, then the only information actually needed by gasnet after the domain is constructed is a four-valued enum. Producing id's for domains would be unnecessary, just pass the internal enum back to the user. This way if they do domain1 == domain2, it will just work. You could even recapture ~most~ of the error catching without carrying the full construction argument list around. I'm not looking for a guarantee that domains reduce to an enum, I just want the guarantee that domains are a value-semantics thing. So, not an id pointing to internally tracked state. This way I don't have to worry about properly destroying domains or creating too many of them redundantly.

Thus, I have concluded. My wishlist is such:

Non-collective, non-named atomic domain construction.
Redundant construction of equivalent domains are compatible.
Domains are values, not objects. The gasnet_atomicdomain_t is at most a byte-copyable struct. No destruction necessary. (this implies point 2).
Domain construction and compatibility/equivalence testing is dirt-cheap, as in it does not query the nic-drivers for atomic capabilities. I want to know that on-the-fly construction will be negligible in the face of the fastest thing: CPU-native atomics. If the op-set is encoded in a bitmap, and querying a team for its being entirely node-local is dirt cheap, then I see no reason why this can't be true in the context of the implementation I have outlined.

These are the properties I would like to expose to upcxx users in our domain analog.

I eagerly await hearing about all the things I have overlooked (or gotten miserably wrong) due to cognitive error or ignorance.

Deploy Atomic Domains

Official response

Comments (36)