device_allocator constructor spec has several problems

There are several problems with the current spec of the two non-default device_allocator<Device> constructors:

Dualing constructors

The most fundamental problem is we specify there are two non-default constructors - one that allocates a segment ("allocating constructor") and the other that accepts a client-allocated segment ("baseptr constructor"). Both of these are specified as collective. By a strict interpretation of collective this means it's erroneous for some ranks to call the allocating constructor while others call the baseptr constructor.

This is a constraint on the caller that seems unnecessary, and furthermore is not required by the current implementation. In reality the current implementation has one internal constructor that is used to implement both with an algorithm sufficiently general to handle either behavior, and allows freely mixing allocating vs baseptr inputs across ranks.

The difference we're discussing here is whether the UPC++ runtime allocates a segment from the device API or accepts an existing one from the calling application. For GPU devices this comes down to a process-local decision during local segment construction, and afterwards has no effect whatsoever on the operation of the collective device. The corresponding gex_Segment_Create() that locally registers the device segment with GASNet is not even a collective operation (and although it also provides a GASNet-level "allocation" option we don't even use that currently).

I'm confident this freedom will remain for our implementation for other GPU devices. It's harder to speculate about future non-GPU devices; but assuming the devices are still some kind of node-local hardware, it's difficult to imagine why all the processes in a distributed job would need to agree about which software layer performed the device API allocation call. If somehow that ever arises we always have the option to later add a kind-specific constraint on the device_allocator specialization for such a device when we add it.

Exception behavior

As mentioned above the implementation allows "mixing" calls to the allocating vs baseptr constructors. Only the allocating constructor has the possibility to initiate a upcxx::bad_segment_alloc exception. However the exception throw is guaranteed to be single-valued, so in a "mixed" call this means callers to the baseptr constructor can actually end up throwing upcxx::bad_segment_alloc due to an allocation failure in the allocating constructor called by a different rank.

Assuming we decide to specify that we allow "mixing" both modes of construction, that will need to be cleaned up.

Handling of pre-existing device data

The baseptr constructor allows the client to build a device_allocator around provided device memory. One of the main use cases for this is to allow UPC++ to "bless" device memory allocated by a different library. However an important sub-case of this is where that device memory already contains valid objects and data that the client wishes to communicate using UPC++. It would be very nice to gracefully handle that case, and I don't currently see a reason not to.

What's missing:

We should specify some guarantee that when using the baseptr constructor, at least construct/destroy/destruct don't overwrite any preexisting data in the client's provided device segment (a property provided by the current implementation). We could optionally extend that guarantee to allocate/deallocate (a property also maintained by the current implementation), but I don't see a use case for that scenario and it's a freedom I'd rather reserve for the implementation absent a compelling motivation.
The current spec language in device_allocator::local(gp) and device_allocator::device_id(gp) are very "allocator-centric", requiring gp to be the result of a call to allocate(). However in the proposed use case one never calls allocate() - you instead call to_global_ptr() on the device pointers to the preexisting objects in the provided device segment. That text should be massaged to allow this (which the implementation already allows).

device_allocator constructor spec has several problems

Dualing constructors

Exception behavior

Handling of pre-existing device data

Comments (14)

Dueling Constructors

Exception Behavior

Handling of pre-existing device data