DeviceAllocator concept

Issue #520 new
Colin MacLean created an issue

To help users write generic code for multiple accelerator device types, I propose defining a C++20 concept which can be used to make writing this code more productive:

#include <upcxx/upcxx.hpp>
#include <type_traits>
#include <utility>
#if __cpp_concepts >= 201907L
#if __has_include(<concepts>)
#include <concepts>
#endif
#endif

namespace upcxx {
#if __cpp_concepts >= 201907L && __cpp_lib_concepts >= 202002L

template<typename T>
concept Device = requires(T d)
{
    typename T::id_type;
    { T::kind } -> std::common_with<memory_kind>;
    d.destroy(std::declval<entry_barrier>());
    { std::as_const(d).device_id() } -> std::same_as<typename T::id_type>;
    { std::as_const(d).is_active() } -> std::convertible_to<bool>;
} && std::move_constructible<T> && std::default_initializable<T>;

template<typename A>
concept DeviceAllocator = requires(A da)
{
    typename A::device_type;
    // Which of these do we actually want to require?
    A(std::declval<typename A::device_type&>(), std::declval<std::size_t>());
    A(std::declval<typename A::device_type&>(), std::declval<typename A::device_type::pointer<void>>(), std::declval<std::size_t>());
    { std::as_const(da).is_active() } -> std::convertible_to<bool>;
    da.template allocate<char>(std::declval<std::size_t>(), std::declval<std::size_t>());
    da.deallocate(std::declval<global_ptr<char, A::device_type::kind>>());
    { da.to_global_ptr(std::declval<typename A::device_type::pointer<char>>()) }
        -> std::same_as<global_ptr<char, A::device_type::kind>>;
} && std::move_constructible<A> && std::default_initializable<A>;

#endif
}

void foo(upcxx::Device auto& c) {}

void bar(upcxx::DeviceAllocator auto& d) {}

int main()
{
    upcxx::cuda_device c(0);
    foo(c);
    upcxx::device_allocator da(c,1024);
    bar(da);
    return 0;
}

There may be other things we wish to provide concepts for to enable other “constrained auto“ function signatures:

void func1(upcxx::GlobalPtr auto ptr);  // upcxx::global_ptr of any type
void func2(upcxx::GlobalPtrT<double> auto ptr);  // upcxx::global_ptr<double> with deduced memory kind

Comments (3)

  1. Dan Bonachea

    As of 2022.3.0 we provide abstract base classes heap_allocator and gpu_device that enable writing the types of generic code suggested here in C++11, without requiring C++20.

    Some member function calls on these abstract bases are virtual and as such might incur a small cost in dynamic dispatch overhead (as opposed to fully static resolution we'd presumably get from using concepts). However in practice most executables will only ever instantiate one device memory kind, so there's a great chance the C++ optimizer can remove that overhead. More importantly, I'd expect the virtual dispatch overhead should never be significant relative to the operation being performed: e.g. device/allocator destroy() is a heavyweight collective operation whose communication costs will render any virtual dispatch overhead negligible in comparison (and this operation is hopefully/unlikely to ever comprise a bottleneck for applications in practice). IMO the only member functions that incur a virtual dispatch with potentially relevant overhead are heap_allocator::(de)allocate(); however I'd still expect the costs of the allocator itself to swamp any dispatch overhead, and I'd be concerned about any application design where fine-grained device allocation activity becomes a non-trivial fraction of the execution time.

  2. Log in to comment