RFE: improved support for debugging at scale

Issue #471 new
Paul Hargrove created an issue

The ExaBiome team has observed that when dealing with bugs which occur only "at scale", the extreme slowdown which results is a significant DISincentive to use upcxx -g. This RFE is related to improving that experience, drawing on notes collected from other issues and Slack discussions.

  • upcxx -codemode=debug -O3 hello_upcxx.cpp is "good enough" for the near-term needs identified by the ExaBiome team. It ensures our correctness assertions are active while allowing the all-important inlining optimization to be applied to the user code.
  • The approach above, however, still links against a libupcxx and libgasnet which have been compiled at -g -O0 or similar. In GASNet this includes hooks for tracing and statistics, and a debugging malloc implementation for GASNet's internal use. So, there is further room for improvement in performance even without removal of debug checks.

Ideas/tasks to pursue (likely not exhaustive):

1) The nobs-inspired portion of our Makefile has optimization, assertions, and debug symbols as distinct boolean dimensions in the configuration space. We currently build only the (1,0,0) and (0,1,1) points in that space, but could build others with less work than might otherwise be required. The particular case of (1,0,1) corresponds to the RelWithDebInfo build type in CMake and is the subject of issue #418.

2) We should perform some degree of profiling to determine where the time is really spent in a debug build of UPC++, relative to the optimized build. We may be surprised to find that small isolated changes in UPC++ and/or GASNet-EX could make a significant difference.

3) Unfortunately, GASNet does not yet have as rich a configuration space as UPC++with respect to these "dimensions". While removing stats, trace, and debug mallocator from a debug build is possible via configure arguments, the (approximate) pairs -g -DDEBUG and -O -DNDEBUG are not separable. This means non-trivial effort if a RelWithDebInfo or "Release + Asserts" build of GASNet-EX were required. So, the profiling item above should be used to determine to what extent the improvements available in libupcxx alone would be sufficient to the nebulous goal of "improved support", before engaging in work on GASNet-level changes which might make little or no difference.

Comments (0)

  1. Log in to comment