Prevent silent use of by-value RPC return of huge types

UPC++ makes it relatively easy to write BAD code such as the following:

#include <upcxx/upcxx.hpp>
using namespace upcxx;

struct bigarray_t {
  double value[1024];

  bigarray_t operator+=(bigarray_t const &other) { // YUK!
    for (int i=0; i<1024; i++) value[i] += other.value[i];
    return *this;
  }
};

int main() {
  init();

  global_ptr<bigarray_t> gp;
  if (!rank_me()) gp = new_<bigarray_t>();
  gp = broadcast(gp, 0).wait();

  // THIS CODE IS A BAD EXAMPLE - DO NOT DO THIS
  future<bigarray_t> badidea1 = rget(gp); // rget huge val into a future
  bigarray_t bigval = badidea1.wait();
  future<> badidea2 = rput(bigval, gp); // rput huge val passed on stack
  badidea2.wait();
  future<bigarray_t> badidea3 = broadcast(bigval, 0); // bcast huge val passed on stack
  bigarray_t bigval3 = badidea3.wait();
  future<bigarray_t> badidea4 = reduce_all(bigval, op_add); // reduce huge val passed on stack
  bigarray_t bigval4 = badidea4.wait();

  finalize();
  return 0;
}

This code compiles and works, but this pattern is a really REALLY bad idea. The problem here is the user is manipulating a statically large type by-value. This is generally frowned-upon, but is not prohibited and as long as the stack doesn't actually overflow there's no obvious sign this is happening. The problem is obvious in this toy example, but can also be very subtly buried in a larger code. I found this analogous defect in supposedly well-tuned code from an experienced UPC++ programmer (who shall remain unnamed).

UPC++ magnifies the problem with types like this one that happen to be TriviallySerializable, because not only can the programmer pass them around on the stack, but they can also pass them directly to the by-value communication APIs (eg future<T> rget(global_ptr <T>)). These APIs are designed as a convenience for scalar values where copies are effectively free, but using them with a large type like this imposes ridiculous overheads from multiple in-memory copies. Unfortunately if an inexperienced UPC++ programmer is extrapolating from a scalar example to their huge type, they could easily end up with code like this.

The correct way to handle big arrays like this is to use the "bulk" overloads provided for each of these communication interfaces (e.g. future<> rget(global_ptr <T> src, T *dest, size_t count) ), but doing so requires additional arguments and more careful marshalling. To summarize, the bad solution requires less typing and doesn't alert you to the problem, and the right solution requires more insight and more typing. This is a dangerous design property.

We'd like to make it more obvious when a program has strayed into this anti-pattern. This was discussed in the 2020-07-15 meeting, and the consensus seems to be we should add static assertions to detect the instantiation of the scalar communication API with value types over some compile-time-tunable threshold with a reasonable default. This will ensure we don't silently compile such dubious code.

Comments (10)