- edited description
Memory kind support is still under discussion. In following text, I will try to summarize the possible scenarios that were discussed by Paul, Dan, Bryce, and Khaled. I hope to reach some agreement on these issue to craft the APIs accordingly.
A- For shared data:
Who allocate and how data are managed?
• Option 1: UPCXX manages device memory to allow allocating a memory arena from which smaller allocations could be serviced. This is consistent with the current main memory support where the shared state allocation is solely managed by the UPCXX runtime. The allocation function will need to be overloaded with versions that have additional arguments for the memory kind and possibly the domain, if multiple memory is split between multiple domains.
• Option 2: User allocates device memory then use an up-cast function to create a global pointer to the device memory. This avoids adding complexity to the current allocation function. It may create a challenge if the application stresses UPCXX runtime by making many small device allocations and try to upcast all of them. Each of these allocation could potentially be treated as an individual segment, right?
• Option 3: Host data allocation is dealt with different from other memory kinds (devices), e.g, UPC++ manages only one kind of shared memory (Host) and the application manages the rest. This option is equivalent to no support for other memory kinds, which is the original UPC++ support level.
In all cases, the shared segment metadata could carry the kind information, which may be expensive to query using for instance current generation nVidia system calls (up to multiple us on summitdev power8/nvidia GPUs under high concurrency).
The use of the upcast method is restricted to UPCXX predefined/supported memory kinds.
B- For private data:
This is a bit more complicated because typically the application is free to use any allocator. The question is how private data are presented to communication APIs.
• Option 1: Private pointer should first be up-casted to a global pointer before being passed to the data transfer APIs. In this case both src and destination of put/get are global pointers. Current APIs are not following such signature, probably to avoid confusing users about third party initiation of data transfer.
• Option 2: An additional kind argument will be added to communication APIs put/get to disambiguate private pointers. This approach is likely to impact the signatures of all APIs accepting private pointers.
• Option 3: Private pointers are required to satisfy UVA (unified virtual addressing) and system facility for disambiguation. The UPCXX API, could internally query the private pointer type, with the possibility of caching per page data to avoid the query cost for non-OS managed (migratory) pages. In such case, if a device does not such UVA, it will not supported! (This solution is not favored by multiple people in the group, although it is implicitly assumed in the MPI case.)
The choice between these alternatives has multiple consequence for performance, consistency with the rest of the design, and complexity of the API. Please feel free to add options and to clarify weaknesses of strength of various options.