Generated views

Having to write iterators because one doesn't exist to traverse your data layout is a huge burden. How would one build a column view of a float[100][100] matrix? They could create a strided_ptr<T> pointer-like iterator that knows to jump by a certain number of bytes when doing pointer arithmetic, but that's a lot of code to write. Providing upcxx::strided_ptr would probably service a lot of needs, but I'de like to give users something even more general: a view that works with an explicit length and generating function instead of iterator pair.

float matrix[10][10];

for(int col=0; col < 10; col++) {
  upcxx::rpc(rank, 
    [](view<float> column) {
      for(int row=0; row < 10; row)
        consume(column[row]);
    },
    upcxx::make_view(
      /*length=*/10,
      [&](upcxx::view_emitter<float> emit) {
        for(int row=0; row < 10; row++)
          emit(matrix[row][col]);
      }
    )
  );
}

For trivial types T, we should be able to make the above code work and perform as expected. Performance gets trickier with non-trivial T because serialization requires that a type knows how to compute an upper-bound of the buffer space it will need on the wire before it gets told to serialize. For instance, serializing a std::vector<std::string> is two-pass, one to sum up the needed buffer space, the second to fill in the buffer (this is expected to perform better than a single pass over a growable buffer). Iterated views also do this two-pass, and since the iterator is Forward we know thats safe. Accordingly, generated views would also need to be two-pass, meaning that the users lambda would be invoked twice for different purposes. For max performance we would want the lambda to be specialized (in terms of code-gen) for each context, the first where its summing buffer upper bounds of each T and the second where its recursively invoking serialization of each T to the buffer. Without specialization, we would have to hide a flag in view_emitter<T> that knows if its summing or serializing. That might perform fine, since the compiler could conceivable hoist the void emit(T x) { if(summing_not_serializing) {...} else {...}} condition out of the users emit-loop, thus building our specialized contexts for us, but that involves trusting the compiler. To guarantee specialization we would like the user lambda to be templated on the emitter type. C++14 allows templated lambdas by use of the auto keyword on the arguments. C++11 users are just SOL, they would either have to write their own functor class with templated operator() or fallback to non-template-specialized lambda that hopes the compiler's inlining and hoisting is aggressive enough.

Proposed view_emitter<T>:

////////////////////////////////////////////////////////////////////////////////////
// in upcxx
struct view_emitter_unknown;
struct view_emitter_ubound;
struct view_emitter_serialize;

template<typename T, typename Mode = view_emitter_unknown>
class view_emitter;

template<typename T>
class view_emitter<T, view_emitter_unknown> {
  bool ubound_not_serialize_;
  void emit(T const &x) {
    if(ubound_not_serialize_) {...}
    else {...}
  }
};

template<typename T>
class view_emitter<T, view_emitter_ubound> {
  void emit(T const &x) {...}
};

template<typename T>
class view_emitter<T, view_emitter_serialize> {
  void emit(T const &x) {...}
};

////////////////////////////////////////////////////////////////////////////////////
// user

// C++14 user
upcxx::rpc(target,
  [](upcxx::view<T> foo) {...},
  upcxx::make_view(
    length,
    [](upcxx::view_emitter<T,auto> emit) {
      for(...) emit(...);
    }
  )
);

// C++11 user
upcxx::rpc(target,
  [](upcxx::view<T> foo) {...},
  upcxx::make_view(
    length,
    [](upcxx::view_emitter<T> emit) {
      for(...) emit(...);
    }
  )
);

// User who wants the best without knowing their C++ version,
// we have #define'd UPCXX_VIEW_EMITTER to use auto or not
// depending on the c++ version detected.
upcxx::rpc(target,
  [](upcxx::view<T> foo) {...},
  upcxx::make_view(
    length,
    [](UPCXX_VIEW_EMITTER(T) emit) {
      for(...) emit(...);
    }
  )
);

Comments (8)