noise_log mis-uses gex_Coll_Reduce

Issue #506 resolved
Dan Bonachea created an issue

src/backend/gasnet/noise_log.cpp contains:

  gex_Event_Wait(gex_Coll_ReduceToAllNB(
    gasnet::handle_of(upcxx::world()),
    &r, &r,
    GEX_DT_USER, sizeof(reduced), 1,
    GEX_OP_USER,
    (gex_Coll_ReduceFn_t)[](const void *arg1, void *arg2_out, std::size_t n, const void*) {
      reduced const *in = (reduced const*)arg1;
      reduced *acc = (reduced*)arg2_out;
      acc->rank_least = std::min(acc->rank_least, in->rank_least);
      acc->rank_n += in->rank_n;
    },
    nullptr, 0
  ));

This code silently assumes that n == 1 in the operator callback, which is a violation of the GEX spec for client-defined reduction operators.

Paul believes the current GASNet collectives implementation will never pass n > 1 to the callback for this particular case so this might not currently be a problem in practice. However it's a problem waiting to happen and something we should fix. If this property was violated, console messages from the runtime using the noise_log might be silently swallowed. This notably includes a number of verbose outputs from init() and finalize() that provide status information and warnings.

Comments (2)

  1. Dan Bonachea reporter

    ee74adf adds an assertion to detect the potential invariant violation in debug mode.

    I'll return to fix this for real post-release

  2. Dan Bonachea reporter

    Fix issue #506: noise_log mis-uses gex_Coll_Reduce

    This fixes an internal defect discovered via code inspection. It's believed the defect could not result in any externally visible behavioral defect, due to unspecified aspects of the current GASNet-EX implementation.

    → <<cset 4586e43d3b44>>

  3. Log in to comment