Implement gptr debug checking

Issue #83 resolved
Dan Bonachea created an issue

Two early users have already separately expressed fundamental confusion about the behavior of shared memory and global pointers, and this confusion is being exacerbated by the lack of checks on global pointer downcasts that have undefined behavior. Lacking these checks, a new programmer is easily confused and doesn't understand if they did something wrong or if the implementation is malfunctioning.

Here is a trimmed-down example of a naive INCORRECT program:

#include <upcxx/upcxx.hpp>

int main() {
  upcxx::init();
  upcxx::global_ptr<int> gp;

  // allocate a shared array on rank 0 and broadcast it
  if (!upcxx::rank_me()) gp = upcxx::new_array<int>(upcxx::rank_n());
  gp = upcxx::broadcast(gp,0).wait();

  // downcast - undefined behavior on most non-zero ranks!
  int *lp = gp.local();
  // write a unique value to a unique element
  lp += upcxx::rank_me();
  *lp = 100+upcxx::rank_me();
  // output result
  if (!upcxx::rank_me()) {
    for (int i=0; i < upcxx::rank_n(); i++) std::cout << lp[i] << std::endl;
  }

  upcxx::finalize();
  return 0;
}

To make matters worse, the correctness of this type of program will eventually differ based on the job parameters and local_team layout - ie once we implement shared-memory downcasts, this program will be correct when run on a single-node/single-local_team, but incorrect for distributed memory.

We need to curtail this by generating fatal errors for this type of easily detectable UB, at least in debug mode and possibly also opt mode if we think it can be done without significant performance impact.

Some UB usage errors that can and should be detected:

  • downcast of global pointer that is not .is_local()
  • upcast for a pointer outside the shared segment
  • pointer arithmetic on null global pointers

Comments (12)

  1. Dan Bonachea reporter

    Some of the UB gptr upcast and downcast operations are now diagnosed.

    Would still be nice for assertions to detect the following in DEBUG mode:

    1. pointer arithmetic on null global pointers,
    2. (possibly) all arithmetic and accesses using invalid gptrs - those that do not point into a valid shared segment.
      • This could provide a friendlier crash message than an attempt to use garbage gptrs (eg uninitialized variables), which otherwise will probably generate obscure segment bounds violations in GASNet
      • We have a similar check in Berkeley UPC and it's historically been very effective at helping to diagnose such errors
  2. Dan Bonachea reporter

    issue #83: strengthen global_ptr correctness assertions

    • Add an undocumented global_ptr::check() method that performs various sanity checks on the global pointer, to detect invariant violations that could be caused by mis-use or memory corruption.

    • Adds a new UPCXX_GPTR_CHK macro that compiles away to nothing by default in production mode, but when assertions are enabled defaults to invoking global_ptr::check().

    • Insert UPCXX_GPTR_CHK into all methods that create global_ptrs or manipulate their internal state.

    • The default checking behavior can be overriden (on a TU basis) with a new UPCXX_GPTR_CHECK_ENABLED define (undocumented for now).

    Resolves issue #83

    → <<cset 6d2b9bb9b047>>

  3. Log in to comment