Proposal: Array specializations for `global_ptr`

Issue #179 new
Colin MacLean created an issue

1. Motivation

The purpose of this proposal is to provide a safer interface to allocate, construct, destruct, and destroy both scalar and array memory backing global_ptr. Currently, the global memory used to back scalar and array global_ptrs are created and freed using pairs of mutually exclusive functions, new_ and new_array. The destruction of this memory is accomplished via delete_ and delete_array respectively. A mismatch will cause a crash.

The proposed solution to this problem is to follow the idiom of the C++ standard for handling arrays with smart pointers. Rather, it merely uses the same trick using the type system to know when it needs to handle arrays differently. This proposal provides new creation and destruction functions overloaded to properly free the containing type with awareness to arrays based on this array hinting.

This proposal also enables multidimensional array support.

2. Overview

Two specializations of global_ptr would be added: the first for arrays of unknown bound and the second for arrays with static bounds.

template<typename T, memory_kind KindSet = memory_kind::host>
class global_ptr<T[]> : public global_ptr<T>
{
public:
    using element_type = T;
    using pointer_type = T*;
};

template<typename T, size_t N, memory_kind KindSet = memory_kind::host>
class global_ptr<T[N]> : public global_ptr<T>
{
public:
    using element_type = T;
    using pointer_type = T*;
    static constexpr size_t size = N;
}

These template specializations extract the type from the array to behave like a normal global_ptr<T> except where overridden. The array specifier only comes into play for array-aware functions and most functions, such as local() are inherited from the base.

global_ptr<int>::element_type; // int
global_ptr<int[]>::element_type; // int

int* ptr = /* ... */;
global_ptr<int> a(ptr);
global_ptr<int[]> b(ptr);

auto ap = a.local(); // decltype(ap) == int*
auto bp = b.local(); // decltype(bp) == int*
assert(ap == bp);

Where the array hint comes into play is for cases where the behavior needs to be different to handle arrays. Proper freeing of the contained memory is the most critical need.

3. make_global_ptr and make_global_ptr_for_overwrite

To take advantage of the array hinting, a new set of global_ptr generation functions would be provided while preserving and deprecating the new_, new_array, delete_, and delete_array functions. These legacy functions would still enable existing code to function as normal and the lack of array hinting would continue the requirement of calling the corresponding delete function as currently exists.

// 1. T is not an array
template<typename T, typename... Args>
global_ptr<T> make_global_ptr(Args&&... args);

// 2. T is U[]
template<typename T>
global_ptr<T> make_global_ptr(std::size_t N);

// 3. T is U[N]
template<typename T>
global_ptr<T> make_global_ptr();

// 4. T is U[]
template<typename T>
global_ptr<T> make_global_ptr(std::size_t N, const std::remove_extent<T>::type& u);

// 5. T is U[N]
template<typename T>
global_ptr<T> make_global_ptr(const std::remove_extent<T>::type& u);

// 6. T is not U[]
template<typename T>
global_ptr<T> make_global_ptr_for_overwrite();

// 7. T is U[]
template<typename T>
global_ptr<T> make_global_ptr_for_overwrite(std::size_t N);

(1) Constructs an object of type T and wraps it in a global_ptr using args as the parameter list of the constructor of T. This overload only participates in overload resolution if T is not an array type.

(2,3) Default constructs a possibly-multidimensional array as if each of the elements are default constructed. The overload (2) creates an array of size N along the first dimension. The array elements are initialized in ascending order of their addresses. If an exception is thrown during the construction of any element, progress is rolled back by calling the destructors of all previous elements in the reverse order of their construction. (2) only participates in overload resolution if

(4,5) Same as (2,3), but if U is a scalar, each element is copy constructed from the value u. If U is an array, this is performed as if by initializing every non-array element of the (possibly multidimensional) array with the corresponding element from u.

(6) Same as (1) if T is not an array type and (3) if T is U[N], except that the created objects are default-initialized (the constructors are not called).

(7) same as (2), except that the individual array elements are default-initialized.

The purpose of (6,7) are to provide the user with raw memory so that they may manually call the constructors differently on each element using placement new.

If T is U[], additional memory may be transparently allocated to store the size of the array. This is not necessary for non-array and U[N] types.

4. free_global_ptr

Using the hint provided in the type information, the proper destruction and deallocation calls can be deduced.

// 1
template<typename T>
void free_global_ptr(global_ptr<T> gptr);

// 2
template<typename T>
void free_global_ptr(global_ptr<T[]> gptr);

// 3
template<typename T, std::size_t N>
void free_global_ptr(global_ptr<T[N]> gptr);

(1) Destroys and frees the memory backing gptr as a scalar type.

(2) Destroys each element of and frees the memory backing gptr as a runtime-bound array type.

(3) Destroys each element of and frees the memory backing gptr as a statically bound array type.

5. Examples

The following examples will refer to the following toy types:

struct A
{
    A(const A& a) = default;
    A(int a = 42) : a{a} {}
    int a;
};

struct B
{
    B() = delete;
    B(const B& b) = delete;
    B(int b) : b{b} {}
    int b;
};

Scalar:

global_ptr<A> gp1 = make_global_ptr<A>(1);
assert(gp1.local()->a == 1);
free_global_ptr(gp1);

Runtime-bound array default construction:

std::size_t n = 5;
global_ptr<A[]> gp2 = make_global_ptr<A[]>(n);
for (std::size_t i = 0; i < n; ++i)
    assert(gp2.local()[i].a == 42);
free_global_ptr(gp2);

Static-bound array:

global_ptr<A[5]> gp3 = make_global_ptr<A[5]>();
for (std::size_t i = 0; i < 5; ++i)
    assert(gp3.local()[i].a == 42);
free_global_ptr(gp3);

Copy-construct array from default value:

std::size_t n = 5;
A u{2};
global_ptr<A[]> gp4 = make_global_ptr<A[]>(5,u);
for (std::size_t i = 0; i < n; ++i)
    assert(gp2.local()[i].a == 2);
free_global_ptr(gp4);

Create global_ptr from type lacking default and copy constructors:

std::size_t n = 5;
global_ptr<B[]> gp5 = make_global_ptr_for_overwrite<B[]>(n);
B* ptr = gp5.local();
for (std::size_t i = 0; i < n; ++i)
    ::new (&ptr[i])(i);
free_global_ptr(gp5);

Multidimensional global_ptr:

constexpr std::size_t N = 3;
global_ptr<A[N][2]> gp6 = make_global_ptr<A[N][2]>(n);
A (*lptr)[2] = gp6.local();

for (std::size_t i = 0; i < N; ++i)
    for (std::size_t j = 0; j < 2; ++j)
        assert(lptr[i][j] == 42);

for (std::size_t i = 0; i < N; ++i)
    for (A& el : gp6.local()[i])
        el = 5;

free_global_ptr(gp6);

6. Constructors

The constructor for global_ptr<T[N]> would be:

template<typename T, std::size_t N>
global_ptr<T[N]>::global_ptr(T (*ptr)[N]);

The constructor for global_ptr<T[]> should be protected. As the size of the array is stored before the start of the array for T[], creating such an array should only be done by make_global_ptr<T[]>. Aliasing into this type would use a non-array T, which is still valid for pointer arithmetic.

template<typename T, std::size_t N>
global_ptr<T[]>::global_ptr(T *ptr);

7. Element count information (optional)

Because global_ptr<T[]> stores its size information before the start of the pointer and global_ptr<T[N]> knows its size information as a constant expression, a member function to query the array size could optionally be provided.

std::size_t global_ptr<T[]>::length() const;
constexpr std::size_t global_ptr<T[N]>::length();

8. Array aliasing support

In order to make remote multidimensional arrays useful, the ability to create pointer aliases to elements within these arrays is necessary. The proposal to enable such aliasing is to use a ptr_at() member function. This interface can also be used with single dimensional arrays for those who do not like using pointer arithmetic to navigate arrays.

// 1
template<typename T>
global_ptr<T> global_ptr<T[]>::ptr_at(std::size_t n);

// 2
template<typename T, size_t N>
global_ptr<T> global_ptr<T[N]>::ptr_at(std::size_t n);

8.1 Array aliasing examples

std::size_t n = 3;
global_ptr<int[][4]> ga1 = /* ... get remote global_ptr {{0,1,2,3},{4,5,6,7},{8,0,10,11}} */
global_ptr<int> ga2 = ga1.ptr_at(1).ptr_at(2); // ga2 aliases [1][2]
rpc(rank, [](global_ptr<int> p) { std::cout << *p.local() << std::endl; }, ga2);
// outputs 6

9. Decaying global_ptr

To improve interoperability with non-hinted global_ptrs, global_ptr<T> will implicitly decay to global_ptr<std::remove_extent<T>::type> May decay multiple times for a multidimensional array (rather than using std::remove_all_extents). This is because one less extent can sometimes make sense.

10. API impacts

Impacts to the API will be minimized by the array overloads of global_ptr automatically decaying to a scalar representation

The API internally needs a little help to deduce the type correctly by using type_identity to force T into a non-deduced context and enable this conversion.

// backport from C++20
template<typename T>
struct type_identity
{
    using type = T;
};

// forces global_ptr to decay.
// Without identity would confuse the template type deduction
auto copy(T const *src, global_ptr<typename type_identity<T>::type> dest);

This implementation detail does not need to be exposed to the user. The spec may simply state that the global_ptr argument will be decayed to the type of the raw pointer.

functions which take a global_ptr<const T> are already in a non-deduced context and do not require the type_identity helper.

10.1 upcxx::copy

Force decay as above

10.2 Atomic operations

static_assert that T must be of non-array type rather than decay. These operations involve single values and decaying to use the first element doesn't seem as good of a practice as requiring the use of ptr_at to make the single-element nature explicit.

10.3 rget

These take a global_ptr<const T> and therefore already implicitly convert

10.4 rput

Change to automatically decay using type_identity. If a user passes a pointer to an array, the rput will work on a global_ptr decayed to that array type:

global_ptr<int[][5]> remote = /* ... */;
int (*l1)[5] = /* ... */;
int *l2 = /* ... */
size_t n_outer = 4;
rput(l1, remote, n_outer); // remote decays to global_ptr<int[5]>
rput(l2, remote, n_outer*5); // remote decays to global_ptr<int>

10.5 r{get,put}_strided

Force decays as with non-strided versions.

Comments (10)

  1. Amir Kamil

    Thanks @Colin MacLean for writing this up.

    Regarding bullet point 10, we use inheritance to solve the deduction problem for global_ptr<T> and global_ptr<T const>. We should be able to use the same trick for global_ptr<T[]> and global_ptr<T[n]> – have them both derive from global_ptr<T>. This should fix deduction without requiring any API changes. See issue #51 and PR #42 for details.

    I’m not super in love with the shared_ptr analogs in bullet point 3. I am concerned that they will lead users to assume that global_ptr automatically manages the underlying memory.

    For bullet point 4, I think I would prefer a free() member function rather than free functions (pun not intended). Member functions would provide better encapsulation and be less typing.

  2. Colin MacLean reporter

    Regarding point 10, without type_identity, the compiler will try to match T as T and T[] simultaneously, resulting in an ambiguity. global_ptr<T[]> and global_ptr<T[N]> both do inherit from global_ptr<T>, but type conversion only comes into play after T is deduced. The type_identity allows T to be inferred from the raw pointer and used but not deduced within the global_ptr<T[]> argument. This allows the compiler to try converting the type, which succeeds. The use of type_identity shouldn’t be stated in the API. Writing that global_ptr<T[]> can decay to a compatible type if necessary is all that’s necessary. This keeps the same function signature as far as the API is concerned, it just needs an internal trick to make the conversion take place.

    Regarding point 3, even the C++ standard has both owning and non-owning smart pointers, so ownership really shouldn’t be assumed even when dealing with smart pointers. I certainly never expected global_ptr<T> to be an owning wrapper, even when I was under the impression that it was modeled after smart pointers. It was a question I had, but was one I immediately answered for myself by looking at the spec. A smart pointer implies that it has additional features to a raw pointer but that can be things like bounds checking and does not automatically imply memory management as one of its features. The API will even note that global_ptrs created this way must be freed with free_global_ptr, which is a necessary statement to not confuse things with the legacy API anyway. I really don’t think it’s going to be much of an issue, especially not compared to having incompatible types of freeing as is the case with new_ and new_array. I wouldn’t be opposed to alternate naming, though. new_global_ptr, perhaps?

    Regarding point 4, my initial design used a member function but I decided to make it a standalone function specifically to make it seem less like an owning wrapper.

  3. Amir Kamil

    We have not encountered that ambiguity with respect to global_ptr<T> vs. global_ptr<T const>. I don’t see why we would run into it with global_ptr<T[]> or global_ptr<T[N]>.

    Here’s a test that compiles fine with Clang:

    #include <upcxx/upcxx.hpp>
    
    namespace upcxx {
      template<typename T, memory_kind Kind>
      struct global_ptr<T[], Kind> : global_ptr<T, Kind> {
      };
    
      template<typename T, size_t n, memory_kind Kind>
      struct global_ptr<T[n], Kind> : global_ptr<T, Kind> {
      };
    }
    
    int main() {
      int const *src;
      upcxx::global_ptr<int[]> dst1;
      upcxx::global_ptr<int[3]> dst2;
      auto res1 = upcxx::copy(src, dst1, 3);
      auto res2 = upcxx::copy(src, dst2, 3);
    }
    

    I don’t expect other compilers to have any issues with this, since this is the same mechanism we use for global_ptr<T [const]>.

    I’m open to being convinced on points 3 and 4. But my current preference would be for member functions for both.

  4. Colin MacLean reporter

    It’s because with <T const>, T is being used and not trying to have a pattern matched to it at that point. If the compiler is at the point of “what is T?” then T could be int or int[], at which point T would be trying to be different types for the first and second arguments. The above fails with GCC. I believe GCC’s behavior is correct and Clang is being overly permissive. I’ll try to find a bug report for the compilers, because they obviously can’t both be correct. https://godbolt.org/z/8Gd5on

  5. Amir Kamil

    I stand corrected.

    Will this also affect binary operations on global pointers (e.g. comparisons)?

  6. Colin MacLean reporter

    Yeah, Clang is wrong here.

    template<typename T>
    auto copy(T* src, global_ptr<T> dst, std::size_t count)
              ^                  ^
              deduced as int     deduced as int[]
    
    template<typename T>
    auto copy(T* src, global_ptr<typename type_identity<T>::type> dst, std::size_t count)
              ^                                             ^
              deduced as int                                dependent type. Uses T = int as deduced from first parameter.
    

    The standard specifies that implicit conversion is the job of overload resolution, which happens after template argument deduction. The first should be rejected for being conflicting types. Definitely a Clang bug to accept this.

  7. Colin MacLean reporter

    Re: comparisons

    No. The templates are at the class level, not deduced at the function call. Templates are already evaluated at this point. It’s the class that is templated, not the functions themselves. In order to get a function to even process a class’s templates, you have to do something like:

    template<typename T>
    struct A
    {
      template<typename U = T>
      std::enable_if<std::is_integral<U>::value>::type foo();
    };
    

    Because the templates are only at the class level with the comparison operators, templates don’t even come into play. All of these functions were defined well before when the class was first ODR-used and this is purely a matter for overload resolution applying to already-existing functions. Implicit conversions are therefore considered.

  8. Paul Hargrove

    @Colin MacLean Thanks for writing up this proposal.

    As we are nearing deadlines for our release, I am asking that @Amir Kamil and @Dan Bonachea not devote significant time on this proposal right now. Once we are past the month of March, then we should resume the discussion.

  9. Dan Bonachea

    This proposal was discussed at length in the 2021-04-21 Pagoda meeting. The main discussion points are recorded in those minutes, but to summarize the main arguments against this proposal are:

    1. The currently proposed function names mimic the std smart pointer interface users are familiar with. However the semantics of global_ptr are very different from std smart pointers in ways that users need to understand to use them correctly, so this similarity conveys a counter-productive intuition. This could be fixed by changing the function names to avoid the misleading similarity.

    2. More fundamentally, using the proposed interfaces pushes owner-specific information into the type system that many callee functions (and indeed many apps) don't care about. This potentially imposes a non-trivial static programmability burden, but without the commensurate productivity payoff of automatic memory management provided by smart pointers. Operations that decay away this static information (which are tempting because they release code from this burden) also erase the very information needed to safely free an object, defeating the goal of the proposal.

    The consensus was to table this proposal for now, although we might pick it up again someday with modifications.

  10. Log in to comment