noise_log mis-uses gex_Coll_Reduce
src/backend/gasnet/noise_log.cpp contains:
gex_Event_Wait(gex_Coll_ReduceToAllNB(
gasnet::handle_of(upcxx::world()),
&r, &r,
GEX_DT_USER, sizeof(reduced), 1,
GEX_OP_USER,
(gex_Coll_ReduceFn_t)[](const void *arg1, void *arg2_out, std::size_t n, const void*) {
reduced const *in = (reduced const*)arg1;
reduced *acc = (reduced*)arg2_out;
acc->rank_least = std::min(acc->rank_least, in->rank_least);
acc->rank_n += in->rank_n;
},
nullptr, 0
));
This code silently assumes that n == 1
in the operator callback, which is a violation of the GEX spec for client-defined reduction operators.
Paul believes the current GASNet collectives implementation will never pass n > 1
to the callback for this particular case so this might not currently be a problem in practice. However it's a problem waiting to happen and something we should fix. If this property was violated, console messages from the runtime using the noise_log
might be silently swallowed. This notably includes a number of verbose outputs from init()
and finalize()
that provide status information and warnings.
Comments (2)
-
reporter -
reporter - changed status to resolved
Fix issue
#506: noise_log mis-uses gex_Coll_ReduceThis fixes an internal defect discovered via code inspection. It's believed the defect could not result in any externally visible behavioral defect, due to unspecified aspects of the current GASNet-EX implementation.
→ <<cset 4586e43d3b44>>
- Log in to comment
ee74adf adds an assertion to detect the potential invariant violation in debug mode.
I'll return to fix this for real post-release