allocate

Issue #25 resolved
BrianS created an issue

Basically, Bryce does not want something like allocate, because

  • allocate is byte centric and

  • placement new is a caveat and modern C++ prefers function and struct abstractions to caveats.

I want something that allocates and constructs, but I do not want to use placement new.

Why? Because placement rule is a caveat. new (allocate(some_shared_segment_memory_pool, sizeof(T)) T(a, b, ...) is really a function call to some theoretical construct() function. Someone who is less familiar with C++ needs to understand what placement new is and how the syntax works to understand what is going on. If you instead have a function-abstraction:

construct<T>(allocate(some_shared_segment_memory_pool, sizeof(T)), a, b, ...)

Or better yet, a function that couples allocation and construction, hiding byte allocation and letting users think about objects, not sequences of bytes:

new_<T>(some_shared_segment_memory_pool, a, b, ...)

^ To understand what the above two function calls do, all you have to understand is /functions/. You don't need to know some special syntax.

The number of programmers who can understand a function call expression is much greater than the number of people who will understand placement new. Everyone knows how to parse a function call.

To a large degree, modern C++ is about /replacing caveat abstractions/ with /struct and function abstractions/. Why? Because this reduces the number of things a user has to learn, since all functions share some properties (single return type, 0 or more template parameters, 0 or more arguments, uniform syntax) and all classes share some properties (first class objects).

E.g. replacing T[N] with std::array<T, N>. T[N] has caveats - it's not a first class entity (cannot be returned by value from a function), it can decay to a pointer, etc. But std::array<T, N> is a struct - it is easier to reason about these abstractions.

Note that I am, admittedly, a well-known abuser of operator overloading and other caveat abstractions. I've worked on Boost.Spirit, Boost.Proto and other DSELs. However, that doesn't make me more eager to use the full expressiveness of the C++ grammar. It makes me more cautious.

Comments (24)

  1. BrianS reporter

    Something like this?

    #include <utility>
    #include <fstream>
    
    namespace upcxx
    {
      template <class T>
      class global_ptr
      {
      public:
        global_ptr(T* ptr): m_ptr(ptr) {;}
        T* local(){return m_ptr;}
      private:
        T* m_ptr;
      };
      template <class T, typename... Args> 
      global_ptr<T> new_(Args&&... args)
      {
        return global_ptr<T>(new T(std::forward<Args>(args)...)); //this would be placement new with our gasnet::allocate call
      };
      template <class T>
      void free(global_ptr<T>& A){ delete A.local();} //this would call destructor and a gasnet::free 
    }
    
    
    int main(int argc, char* argv[])
    {
      upcxx::global_ptr<int> mine =upcxx::new_<int>();
      upcxx::global_ptr<std::ofstream> mos = upcxx::new_<std::ofstream>("myFile");
      upcxx::free<int>(mine);
      upcxx::free<std::ofstream>(mos);
      return 0;
    }
    

    Still lots of things can go wrong in this form.

  2. BrianS reporter

    My opinion of allocate:

    global_ptr is weaker than std::weak_ptr. it is never meant to participate in lifetime assessment. The RAII design is not ours to give our users. They need to decide what flavor of RAII they want to suit their needs.

    You can look into shared vs weak vs unique.

    great fun to explore the issues of closed loops and dangling and garbage collection. I don't think we want to provide a technique to impose a leak mitigation strategy, since global_ptr is not a participant in ownership.

    The user is free to use std::shared_ptr to manage this, or placement new, or create their own pool on top of our shared segment.

    global_ptr is not a shared pointer itself. we in fact rely on the ability to ship global_ptrs to remote ranks so that they can use rput and rget to locations in my local shared segment. We don't want to share ownership of a part of my local shared segment with a global_ptr on another rank. In that sense it is even below std::weak_ptr, since the remote instance of a global_ptr can never assume ownership.

  3. Former user Account Deleted

    We'll need syntax that distinguishes non-initializing from default-initializing as well as array allocation. upcxx::new_<T>() cannot do this like new T / new T() / new T[n] / new T[n](). Consider the typical use case for segment objects is non-initialized POD arrays, that needs to be nice.

    Here's what I propose:

    namespace upcxx {
    
    // does not default-construct, static_assert that T is pod
    template<typename T, size_t alignment = alignof(T)>
    global_ptr<T> allocate();
    template<typename T, size_t alignment = alignof(T)>
    global_ptr<T[]> allocate(size_t n);
    
    // does not destruct, static_assert that T is pod
    template<typename T>
    void free(global_ptr<T> p);
    template<typename T>
    void free(global_ptr<T[]> p);
    
    // does not default-construct, does NOT static_assert T is pod
    template<typename T, size_t alignment = alignof(T)>
    global_ptr<T> allocate_unconstructed();
    template<typename T, size_t alignment = alignof(T)>
    global_ptr<T[]> allocate_unconstructed(size_t n);
    
    // does not destruct, does NOT static_assert that T is pod
    template<typename T>
    void free_destructed(global_ptr<T> p);
    template<typename T>
    void free_destructed(global_ptr<T[]> p);
    
    // I'm not supporting alignment for new_ since it would ugly up the argument list.
    // Users can have aligned new by using allocate_unconstructed and placement new.
    
    // constructs with T{forward(arg)...}
    template<typename T, typename ...Arg>
    global_ptr<T> new_(Arg &&...arg);
    // constructs each with T{arg...}
    template<typename T, typename ...Arg>
    global_ptr<T[]> new_array(size_t n, Arg const&...arg);
    
    // destructs
    template<typename T>
    void delete_(global_ptr<T> p);
    // destructs each element. Departure from delete[]: REQUIRES LENGTH!
    template<typename T>
    void delete_(global_ptr<T[]> p, size_t n);
    
    // extensions for mdspan...
    
    // no construct, no destruct, static_assert T is pod
    template<typename T, typename Layout, size_t alignment = alignof(T)>
    global_mdspan<T,Layout> allocate_md(Layout lay);
    template<typename T, typename Layout>
    void free(global_mdspan<T,Layout> p);
    
    // no construct, no destruct, DO NOT static_assert T is pod
    template<typename T, typename Layout, size_t alignment = alignof(T)>
    global_mdspan<T,Layout> allocate_unconstructed_md(Layout lay);
    template<typename T, typename Layout>
    void free_destructed(global_mdspan<T,Layout> p);
    
    // construct and destruct each element
    template<typename T, typename Layout, typename ...Arg>
    global_mdspan<T,Layout> new_md(Layout lay, Arg const&...arg);
    template<typename T, typename Layout>
    void delete_(global_mdspan<T,Layout> p);
    

    Alignment is usually a runtime value, but I have it here as compile time. This is only because we want it to get a default value, but if I did it at runtime like allocate(n,alignment=alignof(T)) then there is an overload ambiguity for allocate(x) meaning scalar allocate with special alignment or array allocate default alignment. The restriction of alignment being at compile time would hurt very few users. We could always add a special variant that allows runtime alignment.

    allocate vs allocate_unconstructed is really nice. We give users an easy way to allocate non-initialized arrays but only in the cases where its correct to do so. If they really want raw-storage for non-trivial T they can have it at the cost typing "unconstructed".

    I like global_ptr<T[]>. I think it should get array ops like pointer arithmetic and operator< while global_ptr<T> shouldn't. The two should be explicitly convertible with zero runtime cost though. The separate type also stops the user from making the delete vs delete[] mistake prevalent with T*. The situation is worse here because our delete[] (delete_) requires the length, meaning that it isn't just some small fixup of adding the []. They'll need to remember to carry and regurgitate the length to the array's death. I have opted not to stash the length in the array like new[]/delete[] since it complicates the implementation when we need to respect non-alignof(T) alignments (simd). max_align_t does not cover simd alignments, so we would need a configure-time decision for upcxx::max_align_t, thus making it non-portable in the purest sense. Minimizing configure-time logic is a high-priority for me.

  4. BrianS reporter

    Imma gonna ask that we leave this feature out of things for now.

    first, mdspan is not meant to own it's data pointer. While Bryce is arguing this one with the C++ committees I would advocate for the non-owning version of mdspan for the coming spec.

    I would like to not participate in the design space of memory management. If the users want to put their allocated space in a data owning class, or make a pool, or use shared_ptr to manage the lifetime issue depends on the user's preferred model for lifetime management.

    C++ new[] for array objects wouldn't bother with arguments, as that is how most people understand the array allocation, it calls a null constructor.

    I view global_ptr the way I view mdspan. It is a non-owning decoration class that lets a user participate in remote memory operations. memory in the shared segment is the same as memory in the private segment from the perspective of the local rank. global_ptr is how a rank lets other ranks manipulate memory in it's own shared segment. Similarly, you cannot allocate memory in a remote rank (the IPDPS paper had this feature and I'm not sure we want to provide it).

  5. BrianS reporter

    but after saying all that, I realize that placement new is ugly syntax and putting some decoration on it might help people :-(

  6. Former user Account Deleted

    What feature are you arguing against? I don't think there's been any disagreement about our pointer types (global_ptr, global_mdspan) being non-owning. The issue is how we enable users to allocate memory. I was trying to design an allocation API around these obstacles:

    1. We don't want untyped allocation because it forces placement-new on users.

    2. We know that arrays of uninitialized pods is the common case for numerically heavy codes. Implicitly value-constructing (initializing to zero) is not performance acceptable, but eliding it is incorrect for certain types (non-POD's).

    C++ has lots of ways of combining allocation and construction, and each of these have different semantics.

    new T;
    new T();
    new T[n];
    new T[n]();
    

    The right way to leverage all of these variants but using a custom allocator was with placement: new(my_alloc(size)) T/*put variant syntax here*/. That was the original design of C++. Now it sounds like the C++ elite have regrets. The only way to get back all the flexibility of original "new", but as functions instead of syntax, is to reproduce each case in an API. Hence my lengthy proposal. I'm very interested in simplifications.

  7. BrianS reporter

    Yup, using allocate semantics makes for really ugly correct examples.

      global_ptr<Bob> bob1(new(allocate(sizeof(Bob))) Bob(5, 2));
    
      std::shared_ptr<Bob> sbob(new(allocate(sizeof(Bob))) Bob(6, 2),
                                [](Bob* b){b->~Bob(); deallocate(b);});
    
      global_ptr<Bob> b2(sbob.get());
    
      wait(val2);
      int count = val2.result();
      Bob* bptr = new (allocate(sizeof(Bob[count]))) Bob[count];
      global_ptr<Bob> mb(bptr);
    
      for (int i=0; i<count; i++, mb++)
        {
          mb.local()->~Bob();
        }
      deallocate(bptr);
      bob1.local()->~Bob();
      deallocate(bob1.local());
    

    yup, allocate and C++ looks ugly. Are we OK with John's allocate API? the part I'm not on board withis the mdspan being a data owner... have to think about that one...

  8. Former user Account Deleted

    I'm missing the part where the mdpsan is supposed to own. Where was that stated?

    Also, I would like to change my API proposal from upcxx::free and friends to upcxx::deallocate. Free is useful in other contexts, like free-lists. Deallocate is very specific.

  9. BrianS reporter

    I'm now coming around to the pedantic functions approach. All the placement versions are incredibly error prone.

    in fact, new is a poorly thought-out language feature as I mull things over.

  10. BrianS reporter

    I can't get this form to work

      template<typename T>
      global_ptr<T[]> new_array(size_t n) {return global_ptr<T[]>((T[n])(new T[n]));}
    

    It is a bad cast with the thing returned from operator new. can't cast T* to T[]

  11. Former user Account Deleted

    global_ptr<T[]> would require a specialization:

    template<class T>
    struct global_ptr {
      explicit global_ptr(T*); // upcast
      explicit operator T*() const; // downcast
    
      // comparison operators ==, !=, etc
    };
    
    template<class T>
    struct global_ptr<T[]> {
      explicit global_ptr(T*); // upcast
      explicit operator T*() const; // downcast
    
      // implicit conversions to/from global_ptr<T> 
      global_ptr(global_ptr<T>);
      operator global_ptr<T>() const;
    
      // pointer arithmetic extensions
      friend global_ptr<T[]> operator+(global_ptr<T[]> a, ptrdiff_t b);
      friend ptrdiff_t operator-(global_ptr<T[]> a, global_ptr<T[]> b);
    
      // comparison operators ==, !=, etc
    };
    
  12. BrianS reporter

    doesn't the template specialization just push the bad cast further away?

    you still need to make an array from a pointer, since new is going to return a pointer.

  13. Former user Account Deleted

    Are you worried that either global_ptr<T[]> or new_array are unimplementable? I assure you they are. std::unique_ptr<T[]> is implemented, consider how they must have solved this issue. global_ptr<T[]> internally stores a rank/endpoint and a T* (or maybe just an untyped offset for certain gasnet conduits). But that's internals, we could also do everything with void* or intptr_t under the hood. T[]'s aren't first class values, so global_ptr<T[]> will be smart enough not to try and store one.

  14. BrianS reporter

    OK, so I can get my code like this to compile. we don't need the unconstructed cases. the user decides if they want constructors/destructors called by whether they use allocate/free or new/delete.

      global_ptr<Bob> bob1 = allocate<Bob>();
    
      global_ptr<Bob> bob2 = allocate<Bob>(5);
    
      global_ptr<Bob> bob3 = new_<Bob>(5,2);
    
      global_ptr<Bob> bob12 = new_<Bob>(1,2);
      global_ptr<Bob> bob13 = new_<Bob>(1,3);
    
      Bob* b4 = (Bob*)allocate(sizeof(Bob));  // codes with old-school malloc
      global_ptr<Bob> bob5(b4); //legal promotion
    
      Bob barray[5];
      global_ptr<Bob[5]> garray(barray);  // runtime error. not in shared segment
    
    
      global_ptr<Bob[6]> garray2 = new_array<Bob,6>();
    
      deallocate(bob1);
      deallocate(bob2);
      delete bob12.local(); // user calls C++ delete on insides for us. That's bad.
      bob13.local()->~Bob(); // OK. user calls their destructor
      deallocate(bob13.local()); //OK. .... then puts memory back to shared
      delete_(bob3);
      delete_(bob5); //legal
      delete_(garray2);
    

    Is that an adequate allocate/deallocate/new/delete set of functions ? I think it is a complete set. this compiles and runs OK. There is a memory error here, that my mock code does not catch, since I'm not running my own heap. the delete on bob12.local() should cause a runtime error, since that specific address was not likely the result of an OS malloc. since our dheap is not going to be the start of our own malloc space the user is not going to delete our gasnet segment ;-)

  15. BrianS reporter

    I also don't catch the automatic variable being promoted to global_ptr, for the same reason. my global_ptr is not wired to catch out-of-segment. I just want correct user syntax here.

  16. Former user Account Deleted

    So I take it your allocate is equivalent to new T (as opposed to new T()). This implies that for non-POD's, allocate = new_. In all cases we have deallocate = delete_. Except your void* allocate(size_t), which does no construction, confuses things since deallocate(void*) does no destruction while the other deallocate's do destruct. It seems really dangerous to allow different behavior depending on the pointer type, the difference between innocuously calling deallocate(void*) instead of deallocate(T*) could have drastic effects.

    This is why I chose the semantics of allocate/deallocate to never construct/destruct. I slap the user if they attempt to use these for non-PODs, but give them loud and lengthy alternatives in case they need it.

  17. BrianS reporter

    allocate does no construction. deallocate does no destruction. new does construction, delete does destruction. I think I've been consistent.

  18. Former user Account Deleted

    Ok great. I stress that alignment support for allocate is a necessity.

    What happens for allocate or deallocate of non-pods?

    I think garray2 is wrong. new_array should take its length as a runtime arg, not as a second template arg. It should also return global_ptr<T[]> or just gp<T> if we don't do the [] specialization. Our interpretation of a gp<T[]> is that is a pointer into someplace in a continuous array. Whereas gp<T[n]> is a pointer to an array of static length n. This makes a huge difference with pointer arithmetic (advancing by 1 T or n).

  19. BrianS reporter

    We can spec the allocate to take alignment.

    allocate on non-PODs just makes storage. The user needs to know if they want allocate or new semantics, just as it is in C++ now.

    global_ptr<T[]> is a little bit of a mysterious object, as the static size is lost. wouldn't the user want to keep the static size with the data holder. The compiler will certainly like to know n.

  20. Former user Account Deleted

    I agree that allocate functions should not construct, but its dangerous to make this easy for non-POD's. Are you sure you prefer allocate which works for all types as opposed to my allocate/allocate_unconstructed which forces the user (by compile time checks) to see that what they're attempting is dangerous in the case of non-POD's?

    Example:

    struct Pod {};
    
    global_ptr<Pod> x = allocate<Pod>(); // ok
    deallocate(x); // ok
    
    x = allocate_unconstructed<Pod>(); // ok
    deallocate_destructed(x); // ok
    
    struct NonPod { NonPod() { cout<<"hi"; } };
    
    global_ptr<NonPod> a = allocate<NonPod>(); // compiler error thanks to static_assert(std::is_pod<T>::value)
    deallocate(a); // compiler error by same means
    
    global_ptr<NonPod> b = allocate_unconstructed<NonPod>(); // works, reminds user to construct in the name
    new(b.local()) NonPod;
    
    b.local()->~NonPod();
    deallocate_destructed(b); // works, reminds user to destruct in the name
    

    Yes the size is lost for global_ptr<T[]> just as the size is lost when handing around arrays as T*. You say "static size", but that isn't correct since we support runtime computed sizes (via allocate<T>(runtime_length)). Encoding the size into the pointer would inflate our global_ptr struct by another word which seems undesirable. The other place to put the size is in memory adjacent to the allocated array. This is what C++'s new T[runtime_length] does so that delete[] knows how many elements to destruct. We can do this, but handling non-alignof(T) alignments requires padding the front of the array with two words: one for the length and the other to point to the front of the allocated block. That isn't a huge deal, it just requires a specific trick that many people would implement incorrectly on their first attempt. The real benefit of global_ptr<T[]> is that by carrying the [] in the type, we no longer require the user to supply it at delete time (like C++'s delete vs delete[] crisis). That's one less mistake for them to make. They can just call our delete_ and the delete_(global_ptr<T[]>) overload will pick it up and do the right thing (read the adjacent metadata words and call destructors and stuff).

  21. BrianS reporter

    I think most users will rely on the new and delete, like they have been used to with C++. malloc just gives memory, and lots of C++ users chose malloc or new as they require. HPC users will want both kind of functions. There are many places with I malloc space for complex C++ objects in Chombo. The classic example is the destination buffer where the unserialize operation will build the correct state.

    I will try coding up the T[] version and see how it works for me

  22. Log in to comment