Enh: Streamline "empty RMA" operations

Pull request 345 added optimizations for RMA operations that synchronously complete due to shared-memory bypass (i.e., operating on a pointer satisfying global_ptr<>::is_local()).

In that PR I identified that the same machinery can also be used to streamline the degenerate case of "empty RMA", i.e. bulk rget/rput operations called with argument count == 0 indicating the data transfer is a no-op (this property is notably independent of pointer locality). These operations vacuously complete synchronously, and therefore are amenable to bypassing libupcxx entry and eager future/promise completions are permitted to be satisfied before return.

This enhancement issue requests we deploy the eager completion optimization for "empty RMA". The proposed algorithm appears in this comment.

This change is currently "on hold" because it has the potential to introduce a new branch into the critical path of remote RMA operations. The plan is to deploy a compiler annotation (via the gasnet_tools interface) that should ensure the corresponding is-empty branch inside the GASNet RMA header can be optimized away, leading to no net growth in dynamic branch count.

Comments (3)