Off-by-one error in promise constructor leads to breakage with any explicit argument

Issue #280 resolved
Dan Bonachea created an issue

I've discovered what appears to be a serious bug in the behavior of promises constructed with a non-default dependency count.

Example:

#include <upcxx/upcxx.hpp>
#include <iostream>
#include <cassert>

using namespace upcxx;

int main() {
  upcxx::init();

  promise<> p(2); // empty promise with 2 dependencies
  assert(!p.get_future().ready());
  p.fulfill_anonymous(1); 
  assert(!p.get_future().ready());
  p.fulfill_anonymous(1); 
  assert(p.get_future().ready()); // crashes here
  if (!rank_me()) std::cout << "SUCCESS" << std::endl;

  upcxx::finalize();
  return 0;
} 

According to the spec I believe this program should succeed, but in the 2019.9.0 release and develop it crashes on the third assertion.

The problem appears to be an off-by-one error in the promise constructor. Spec:

template <typename ...T>
promise <T...>::promise(std::intptr_t dependency_count=1)

Implementation (src/future/promise.hpp:63):

public:
    promise(std::intptr_t anons=0):
      detail::future_impl_shref<detail::future_header_ops_promise, T...>(
        &(new detail::future_header_promise<T...>)->base_header_result
      ) {
      UPCXX_ASSERT(anons >= 0);
      detail::promise_meta_of(*this)->countdown += anons;
    }

Note the default value in the implementation is 0, instead of 1 as mandated by the spec. Subsequent accounting in the promise implementation makes things behaviorally "work out" when the constructor argument is defaulted, but because the specified and implemented default values to the constructor argument differ, whenever the user explicitly provides any value to the constructor, the behavior will visibly diverge from the spec-mandated behavior.

So far this only appears to affect promises constructed with a non-default argument, but nonetheless this appears to be a serious defect.

The recommended workaround is to always construct with the default dependency count, and then use promise::require_anonymous(N-1).

Comments (3)

  1. john bachan

    The constructor interprets its argument as "extra" dependencies beyond the implicit 1. So the implementation currently sees this constructor as shorthand for the following:

    // shorthand
    promise<> p(N);
    // longhand
    promise<> p;
    p.require_anonymous(N);
    

    The spec specifies this instead:

    // short
    promise<> p(N);
    // long
    promise<> p();
    p.require_anonymous(N-1);
    

    This looks like an easy fix.

  2. Dan Bonachea reporter

    Thanks @john bachan!

    With this fix in places I've verified that the new regression test passes.

    Also, the jac1d code I was writing when I discovered this problem can now be written in an (arguably) more natural form, and now works as intended. Diff against upcxx-extras-2019.11.06:

    --- a/tutorials/2019-11/jac1d.cpp
    +++ b/tutorials/2019-11/jac1d.cpp
    @@ -12,6 +12,7 @@
     #include <cstdlib>
     #include <cassert>
     #include <cmath>
    +#include <algorithm>
    
     using namespace upcxx;
    
    @@ -329,8 +330,8 @@ void jacobi_rpc_phased() {
     // this is a pull variant of jacobi_rpc() synchronized p2p without barriers
     void jacobi_rpc_pull_p2p() {
       static promise<double> boundary[2][2];
    -  for (auto &pp : boundary) for (auto &p : pp) p.require_anonymous(1);
    -  upcxx::barrier(); // ensure static init
    +  std::generate(&boundary[0][0], &boundary[2][0], [](){ return promise<double>(2); });
    +  upcxx::barrier(); // ensure initialization of the promise array
       auto get_boundary = [](int phase, int isleft) { return boundary[phase][isleft].finalize(); };
    
       int phase = 0;
    @@ -356,8 +357,7 @@ void jacobi_rpc_pull_p2p() {
         // setup next iteration
         for (auto &p : boundary[phase]) {
           p.get_future().wait(); // ensure neighbor has fetched
    -      p = promise<double>();
    -      p.require_anonymous(1);
    +      p = promise<double>(2);
         }
    
         std::swap(old_grid, new_grid);
    

    I'm not committing this change to upcxx-extras, since it does not work against the current release.

  3. Log in to comment