Improve type check error for l-value reference arguments to RPC callbacks

Issue #393 resolved
Dan Bonachea created an issue

This discussion is partially copied from spec issue #149

Consider this simple program:

#include <upcxx/upcxx.hpp>

int main(){
  upcxx::init();

  int x = 42;
  upcxx::rpc(0, [](int &val) { }, x).wait();

  upcxx::finalize();
}

This seemingly innocuous program has a subtle type defect that triggers an inscrutable ocean of template instantiation failures which give no real clue as to the problem. Multiple users (including myself) have independently run into a similar defect and wasted considerable time trying to track it down.

The problem is the callback argument type int & will never type check - it must be const int & or int && or int, however it's unlikely even a seasoned C++ programmer would guess that restriction without deep knowledge of UPC++ (especially given the input argument type is int &). As of spec PR #43 we now clarify this requirement in the specification, but it's still an easy programming mistake to make, and potentially hard to diagnose when the example is more convoluted than shown above (eg due to typedefs or distant declaration of the FunctionObject).

We'd like to find a way to detect this class of error and issue a static assert to clue users in to what they did wrong. The basic idea is to leverage sfinae for the callback invocation and provide a "catch all" definition that will static assert when instantiated with a message saying at least something to the effect of "your RPC callback argument types don't match what the runtime expects based on the input arguments to rpc".

Comments (2)

  1. Dan Bonachea reporter

    It's worth noting the instantiation failure we get here is (I think) effectively the same failure one gets for other more egregious mismatches, such as the following:

      int x = 42;
      upcxx::rpc(0, [](int x) { }, x, x).wait();
    
      std::string m;
      upcxx::rpc(0, [](const char *msg) { }, m).wait();
    

    The fundamental problem is the same as far as the type checker is concerned - it couldn't find a definition of upcxx::rpc that would allowing invoking the provided FunctionObject with the provided rpc argument types. In these cases the first error message on gcc 10 gives a bit more clue as to what's wrong:

    error: no matching function for call to 'rpc(int, main()::<lambda(int)>, int&, int&)'
    

    where comparing the argument lists reveals the problem.

    Thus far technical discussions for the l-value reference problem have focused on providing dummy "catch all" definitions that vary the reference qualifiers on the callable to generate a match which can static assert. However we believe the number of "catch all" variants we'd need to generate to ensure an exact match is exponential in the argument count, which could create problems for rpcs with large numbers of arguments.

    However it occurred to me that perhaps we should attempt a simpler, more general approach. Perhaps we can use a single varargs function as our dummy "catch all" callback definition, ie void dummy(...), which should match ANY set of input arguments. If we could arrange for that dummy to be the lowest priority match, then it could static assert something to the effect of "your RPC callback argument types don't match what the runtime expects based on the input arguments to rpc" and potentially help users diagnose both classes of errors.

    Thoughts?

  2. Log in to comment