shipping member functions

Issue #17 resolved
BrianS created an issue

I recall this was problematic when working on AMRStencil. I was jostled into recollection about it when Paul mentioned sizeof and offsetof and looked up the issues he discussed with Steve Watanabe at Intrepid. There, the question was asked "can all pointers be cast to void*" the answer is "all pointers in a category can be. so function pointers can be cast to void functions, and data type pointers to void data type. C++ has another function pointer type that cannot be cast to a function pointer, which is the member function. In AMRStencil I get around the issue by using the special case of lambda capture of this

[this] is a special form of lambda capture that smuggles in the local instance pointer and uses it for symbol resolution. Previous to lambda I had to use awkward std::bind forms.

I would of said "This is total crap, I can build a C++ stencil DSL that ignores member functions" but Phil was very insistent that being a C++ library meant embracing and enabling one of C++s main features over C: object-oriented programming. Phil wanted to write an Euler class, with member functions that do the identifiable steps of a semi-implicit Euler equation time advance. AMRStencil was responsible for data movement and loop scheduling, but he wanted classes and member data and member function.

Right now I don't have an example in my mind, but we are in the business of altering data in remote containers, and many C++ containers are manipulated through member functions.

In the slides presented we have

future<R> rpc(F func, rank r, Args&&... args);

So, in Phil's example, he wants F to be a member function.

Comments (29)

  1. Former user Account Deleted

    The lambda solution is the way to go.

    Member functions make less sense with rpc since you need a "this" that is a valid on the receiving end. But that's only a hurdle to usability since the user can share "this" pointers with peers in solver setup time.

    What makes this near impossible is that the encoding of member function pointers can be way trickier than just a function pointer. Thanks to virtual functions, member functions pointers might have to reference a slot in the vtable. On some compilers, sizeof((ReturnType(T::*)(ArgType...)) can equal up to three words!!! What those words encode is some path through potentially multiple virtually inherited vtables, and that encoding can change depending on the static knowledge of T. For instance if T isn't polymorphic the member function pointer can probably be just a 1-word function pointer. Reverse engineering that encoding on a compiler by compiler by type basis is beyond our scope. If we're willing to assume alignment in the executable code&data segments (not the gasnet rdma segments), then we can just byte-copy the encoding when shipping. But since we aren't counting on that alignment in general, we are hosed!

    Lambdas solve this because they never try to put a Ret(T::*)(Arg...) on the wire.

  2. BrianS reporter

    The problem is that we cannot use lambda to do the capture. the lambda capture trick I use in AMRStencil is the fact that this is captured along with the automatic variables, but this is a meaningless virtual address on the receiver.

    [this,a,b,c](){ memberFunction(a,b,c);}
    

    a,b,c are fine by-value captures, but this is useless. this will compile fine, and likely segfault on the receiving end unless we have something like symmetric code segments. We don't even know if this is in scope on the receiver.

  3. BrianS reporter

    We can offer people the freedoms to call static class member functions, they just have to give the proper qualified name

    [a,b,c](){ Bob::memberFunction(a,b,c);}
    

    maybe that's enough. To offer real member function pointer calling we would need to figure out the suitable location of this on the destination processor.

  4. Former user Account Deleted

    So do you want to help users solve the bootstrapping problem of "this" translation? The dist_map proposed on the bootstrap thread does not do this directly. What we want is a translation table mapping local this pointers to the peer's analog this pointer. To send a method call:

    upcxx::rpc(
      peer_rank,
      [a,b,c](object *me) {
        me->memberFunction(a,b,c);
      },
      // this returns a pointer that gets shipped and bound to "me" in the lambda
      translator.lookup(peer_rank, this)
    );
    

    You could build a translator table by using dist_map to exchange "this" but it wouldn't be ideal since it does a roundtrip for each peer-peer. I have used a scheme that avoids this roundtrip completely....

    Notice that we're conceptually dealing with a "distributed-object". It is one thing from the programmer's view, but its state is partitioned over some team of ranks. The "this" pointer is no longer enough for identifying the object globally since it changes from rank to rank. But we can generate some global id in the object's constructor if we make the constructor collective over the team and increment a counter stored in the team (replicated locally, incremented locally). Assuming that teams have some shippable id value, then a (team-id, counter-value) pair would identify the object in a shippable way. So now we have shippable ids, and we didn't need to do any communication to get them. We can also make the distributed-object constructor insert the id->local this mapping to a hidden translation table.

    namespace upcxx {
      struct dist_id {
        team_id _team_id;
        uint64_t _id;
    
        static unordered_map<dist_id, dist_object*> _registry;
    
        // operator==, operator!=, specialize std::hash
    
        template<class Obj>
        Obj* to_local() { return static_cast<Obj*>(_registry[*this]); }
      };
    
      class dist_object {
        upcxx::team *_team;
        uint64_t _idnum;
    
      public:
        dist_id id() const {
          return dist_id{this->_team.id(), this->_idnum};
        }
    
        dist_object(upcxx::team &t) {
          this->_team = &t;
          this->_idnum = t.id_counter++;
          dist_id::_registry[this->id()] = this;
        }
    
        ~dist_object() {
          dist_id::_registry.erase(this->id());
        }
      };
    }
    
    ////////////////////////////////////////////////////////////////////////////////
    // user code
    
    struct my_dist_thing: upcxx::dist_object {
      // ... 
    };
    
    int main() {
      my_dist_thing bob;
      my_dist_thing sally;
    
      upcxx::barrier(); // ensure everyone has registered bob and sally
    
      // method call on remote bob
      int a, b, c = // ...;
      upcxx::rpc(
        upcxx::rank_me() + 1,
        [=](dist_id bob_id) {
          my_dist_thing *bob = bob_id.to_local<my_dist_thing>();
          bob->foo(a,b,c);
        },
        bob.id() // gets bound to argument of lambda
      );
    
      // or, perhaps we could introduce a dist_object flavor of rpc to do the Obj*->dist_id->Obj* conversion implicitly...
      // template<class Obj>
      // void upcxx::rpc(int rank, Obj *dobj, Lambda lam, Arg...);
    
      upcxx::rpc(
        upcxx::rank_me()+1,
        bob,
        [=](my_dist_thing *bob) {
          bob->foo(a,b,c);
        }
      );
    }
    
  5. Former user Account Deleted

    It's worth noting that the barriers in the example can be removed thanks to the magic of rpc's and futures. The combination of these two is so powerful you can write completely barrier free code.

    namespace upcxx {
      struct dist_id {
        team_id _team_id;
        uint64_t _id;
    
        // CHANGED, stores promise<dist_object*> instead of dist_object*
        static unordered_map<dist_id, promise<dist_object*>> _registry;
    
        // operator==, operator!=, specialize std::hash
    
        // CHANGED to return a future<Obj*> instead of just Obj*
        template<class Obj>
        future<Obj*> when_local() {
          return _registry[*this].get_future();
        }
      };
    
      class dist_object {
        upcxx::team *_team;
        uint64_t _idnum;
    
      public:
        dist_id id() const {
          return dist_id{this->_team.id(), this->_idnum};
        }
    
        dist_object(upcxx::team &t) {
          this->_team = &t;
          this->_idnum = t.id_counter++;
          dist_id::_registry[this->id()].set_value(this);
        }
    
        ~dist_object() {
          dist_id::_registry.erase(this->id());
        }
      };
    }
    
    ////////////////////////////////////////////////////////////////////////////////
    // user code
    
    struct my_dist_thing: upcxx::dist_object {
      // ... 
    };
    
    int main() {
      my_dist_thing bob;
      my_dist_thing sally;
    
      // CHANGED not necessary
      //upcxx::barrier(); // ensure everyone has registered bob and sally
    
      // method call on remote bob
      int a, b, c = // ...;
      upcxx::rpc(
        upcxx::rank_me() + 1,
        [=](dist_id bob_id) {
          // CHANGED
          // we are now in a lambda running remotely, but we aren't sure this peer has constructed bob yet.
          // so we have to ask for a future of bob's address given the id both sides compute redundantly and
          // defer our method call until bob exists.
          bob_id.when_local<my_dist_thing>().then([=]() {
            bob->foo(a,b,c);
          });
        },
        bob.id() // gets bound to argument of lambda
      );
    
      // we could make a futureish dist_object flavor of rpc too. implicitly does the id to
      // pointer conversions and attaches the lambda to the future on the remote end.
      upcxx::rpc(
        upcxx::rank_me()+1,
        bob,
        [=](my_dist_thing *bob) {
          bob->foo(a,b,c);
        }
      );
    }
    
  6. BrianS reporter

    It might be that the cure is worse than the disease (!) The last option, if workable, would be my preference. It wouldn't have to work in all cases, but the ability to call functions on state-full things on other ranks would let people do better encapsulation and software design.

  7. BrianS reporter

    I don't think this-by-value will help much in this case...hmmm, that might be interesting, shipping the entire object by value to the destination.

    I'm actually wondering if it is necessary to do the indirection with the lambda argument. If bob is a small thing, like the reference to a big object, then it's fine to capture it, you just have to future the usage in the rpc if you have not programmed creation with waiting.

  8. BrianS reporter

    If bob is a Big object, where copying the object by value is onerous.

    my_dist_thing bob;
    dist_id id = bob.id();
    
    upcxx::rpc(remote_rank,
                      [=](){  auto bob = id.when_local<my_dist_thing>.then([=](){ bob->foo(a,b,c);});}
                     );
    
    a bit stumbling.   perhaps other forms are handier.  
    

    If my_dist_thing is lightweight, and the user does barrier building then it is not a big whoop to access it by value. we can have a thisP function that knows how to grab the local representation.

    upcxx::rpc(remote_rank, [=](){bob.thisP()->foo(a,b,c);});
    
  9. Former user Account Deleted
    upcxx::rpc(remote_rank, [=](){bob.thisP()->foo(a,b,c);});
    

    So you're capturing all of bob, but you really only needed his id. Unfortunately, bob is probably not serializable. If he is, then you've also invoked a bob constructor to materialize him out of deserialization, which means now the user might have to write a speical constructor that doesn't do the collective dist_object() thing. So I think we have hidden warts here. Also, you're inviting the sloppy user to just do:

    [=]() { bob.foo(a,b,c); }
    

    I think beating id's into peoples heads is the only way to go.

  10. BrianS reporter

    All I want to serialize is the base class dist_object and it only has the virtual upcast thisP as legal thing you can call. thisP also works locally. We might serialize the object and the pointers, but the pointers will segfault on the receiver. By it's nature dist_object is for things that should not be serialized. You don't use dist_object for things that are small structs of PODs. For those things it's better to ship the actual object in the capture. I'm not sure how we get that effect. I don't want the user writing dist_object maintenance code. I would like objects that derive from dist_object to pass-by-value just the base class...maybe we can't, in which case the user should instantiate the dist_id and use that in the rpc call.

    I guess this comes down to "how are we shipping the captured state?"

  11. Former user Account Deleted

    We can only get the capture you described working with the explicit capture:

    // we can make Arg's serialization do whatever we want
    Arg y = ...;
    rpc(rank, [](Arg &x) { f(y); }, y);
    
    // but this will always fail unless Arg is TriviallyCopyable
    Arg y = ...;
    rpc(rank, [=]() { f(y); });
    

    This is from my post before, except generalized to handle multiple dist_objects. I think it does what you where instead of us hacking serialization, we just lend an rpc variant that handles this case. Renaming it to be something other than rpc. Arguments which are dist_object's implicitly get their names shipped, and futures waited for on the remote side. Arguments which arent dist_objects just get serialized as usual.

    upcxx::rpc_dist(
      upcxx::rank_me()+1,
      [=](my_dist_thing *bob, my_dist_thing *sally, int some_int) {
        bob->foo(sally, some_int)
      },
      bob, sally, 0xdeadbeef
    );
    
    // done as a regular rpc would be:
    upcxx::rpc(
      upcxx::rank_me()+1,
      [=](dist_id bob_id, dist_id sally_id, int some_int) {
        when_all(
          bob_id.when_local(),
          sally_id.when_local()
        ).then([=](my_dist_thing *bob, my_dist_thing *sally) {
          bob->foo(sally, some_int);
        });
      },
      bob.id(), sally.id(), 0xdeadbeef
    );
    
  12. Former user Account Deleted

    I want to change dist_object in a big way. Having done dist_object before, I remembered there's something that really stunk that I overlooked here. Registration of the name-to-this cannot be done in the constructor because the promise fulfillment will execute waiting rpcs (those that did future::then on arrival) but the subclass constructors haven't executed yet. The way I fixed it, which I later hated since it was a source of a few bugs, was to postpone registration until some later time as indicated explicitly by the user invoking a .introduce() method which signalled that the object was now visible in the registry. Forgetting to introduce objects bit me more than once.

    So I would like to move away from the inheritance model where distributed object classes should inherit from our base dist_object and towards one called dist_state<T>. A dist_state<T> is a value of type T that has been named for peers to see.

    template<class T>
    struct dist_state {
      // value moved into internal storage controlled by upcxx
      // collective call, registers the value under the generated name, promise fulfilled so rpcs
      // may fire here.
      dist_state(T value);
    
      dist_id<T> id() const;
    
      // pointer to T semantics
      T* operator->() const;
      T& operator*() const;
    };
    
    // the serializable name of a T
    template<class T>
    struct dist_id {
      // also pointer to T semantics
      T& operator*() const;
      T* operator->() const;
    };
    
    /////////////////////////////////////////////////////////////////////////////////////
    // user code
    
    struct mesh_data { ... };
    
    mesh_data stuff_raw;
    // build stuff_raw
    // ...
    
    // share it
    dist_state<mesh_data> stuff{std::move(stuff_raw)};
    
    rpc_dist(neighbor,
      [=](dist_state<mesh_data> peer_stuff) {
        peer_stuff->something(...);
      },
      stuff
    );
    
    // mesh_data destroyed when `stuff` destructed. if there are in-flight rpc's destined here that's the users's problem (for now).
    
  13. Dan Bonachea

    John - your model seems to make sense. So the dist_state constructor should probably also take a team argument right? (Possibly with the default value of the primordial team)

    Also, would the upcxx user-level team_t actually be a dist_state<local_team_handle_t> that the library implicitly builds during the team constructor, or would the end-user be responsible for performing that wrapping before shipping a team reference? Same question for any other distributed objects exposed by UPCXX. Is this a best-practice for anyone implementing a distributed object abstraction in UPCXX (eg a user-written distributed array library)?

    Finally, what's the behavior when a user accidentally leaves out the dist_state<> in your example and directly passes a mesh_data as the rpc argument? (assuming it's not trivially copyable or serializable) Is that meaningful? Is it a detectable error?

  14. BrianS reporter

    That seems to hold up. dist_state i still collective with who you intend to collaborate with. Did we lose the team reference?

  15. Former user Account Deleted

    Forgot to include team in the constructor. And like Dan suggests upcxx teams would implicitly track gasnet teams with a dist_state. Type errors will stop users from dropping dist_state when serializing most of the time, heh.

  16. BrianS reporter

    can we go back to calling it dist_object. state is a bit of a loaded term for modeling packages and it is really meant to be much more generic

  17. BrianS reporter

    Is there a way to work around the use of a stack instance and the move operation? I agree that construction by move is wanted, but user's might think they can talk to both objects. we should have a clear single instance

      dist_object<Mesh> dmesh(myTeam, ...) ; // remaining arguments are for Mesh constructor.
    
  18. BrianS reporter

    how about we give dist_object the ability to call any member function on a remote member by adding the intended rank?

    class Mesh
    {
       public:
        Mesh(A, B, C);
        refactor(D);
    };
    team myteam(....);
    A a; B b; C c; D d;
    dist_object<Mesh> dmesh(myteam, a, b, c);
    dmesh(neighbor, Mesh::refactor, d); // capture d, invoke member function refactor on rank neighbor of myteam
    
  19. Former user Account Deleted

    @bvstraalen : Are there any functions that mesh_data needs to have to work inside a dist_state object?

    No, fully generic T supported (as long as it its copyable or moveable).

    @bvstraalen can we go back to calling it dist_object. state is a bit of a loaded term for modeling packages and it is really meant to be much more generic

    Since general T is allowed in dist_xxx<T>, I didn't think object is appropriate (dist_object<int> seems weird). It was a good fit for the inheritance scheme we were using.

    @bvstraalen: Is there a way to work around the use of a stack instance and the move operation? I agree that construction by move is wanted, but user's might think they can talk to both objects. we should have a clear single instance

    Can always just construct in-place, no new API needed:

    dist_state<mesh_data> stuff{team, mesh_data{...}};
    

    @bvstraalen : how about we give dist_object the ability to call any member function on a remote member by adding the intended rank?

    Not doable the way you wrote it because it requires serialization of Mesh::refactor as a runtime value. This could work though:

    // mehtod has to be a compile-time parameter so we don't have to serialize it as a value,
    // instead we can internaly bake it into a lambda that just *knows* the member name
    dmesh.remote_method<Mesh::refator>(neighbor, d);
    
  20. b

    Strongly Favor calling it dist_object<> and calling shared_array<> either dist_array<> or dist_vector<>.

    Alternatively, I could possibly accept shared_object<> and shared_array<>.

    I am very Strong Against dist_object<> and shared_array<>. I want consistent names.

  21. Dan Bonachea

    If I understand correctly, the directory abstraction we are talking about here is logically a distributed array, with one element per team member. However it doesn't really act like an array because in a given context you can only access one element (the entry local to the thread asking). Also construction and destruction don't act like a traditional PGAS distributed array, because the elements are independently constructed and destructed without collective synchronization.

    Some things I still don't understand about John's latest proposal:

    1. Is dist_id something a user ever works with, or just a hidden part of the dist_object implementation? I don't see it in the example.
    2. If a user wants to make a directory for a big object without copying T, can he make a dist_object<T*> or even a dist_object<global_ptr<T>>?
    3. Does the local element of a dist_object<T> live in shared memory?
    4. Assuming yes, does dist_object<T> provide an operator& or other mechanism to obtain a global_ptr<T> to reference the local element, for example to pass to other threads for use in put/get?
    5. Do we help users at all with the lifetime issues? Eg if an rpc tries to retrieve the local element before it exists, can it get a future for that? If it tries to retrieve a local element after it has been destroyed, is that a detectable error?
  22. Former user Account Deleted

    In response to Dan's points: 1. Possibly. The example was using rpc_dist which hides the id management. But users will have access to it in cases where they need it. 2. Moving (std move) T should be good enough for properly engineered T. But T can be any type, including U*. 3. Good idea. I guess it could. 4. If 3 is true, sure. 5. The futures for object birth we do handle. That's what rpc_dist does for you. Object death is harder. For now, if an rpc lands after an object's death then the user messed up. Later, we could add a user opt-in consensus algorithm to determine when the object can die safely.

  23. Amir Kamil

    Has this been sufficiently resolved?

    Supporting rpc on member functions is something that we can think about down the line. C++17 provides support for invoking callables, which include member functions, and we can backport this functionality to non-C++17 rpc.

  24. Dan Bonachea

    I think the only outstanding issue here, specifically should the local element of a dist_object<> live in the shared heap, is adequately covered by issue #89.

  25. Log in to comment