Expose shared heap usage at runtime

Issue #382 resolved
Rob Egan created an issue

Hello in a slack chat with Dan I mentioned that it would be helpful to inspect the counts and allocations of objects within the shared heap for the purposes of tracking overall memory consumption and squashing memory leaks.

Dan suggested a hack allocating something too large and reporting the message from the resulting bad_alloc exception. Essentially:

try { upcxx::new_array<char>(1ULL<<60); } catch (std::bad_alloc &e) { std::cout << e.what() << std::endl; }

to produce this nice human readable output:

upcxx::bad_shared_alloc: UPC++ shared heap is out of memory on process 1
 inside upcxx::new_array while trying to allocate 1152921504606846984 more bytes
 Local shared heap statistics:
  Shared heap size on process 1:             128 MB
  User allocations:               2 objects, 48 B
  Internal rdzv buffers:          0 objects, 0
  Internal misc buffers:          0 objects, 0

While this is clever and we intend to use this, it would be even more helpful to have a call to a upcxx:: function that returns those 7 items of data in a struct { size_t shared_heap_bytes, user_count, user_bytes, rdzv_count, rdzv_bytes, misc_count, misc_bytes}. Just knowing the usage at runtime will help us to make decisions on what fraction of the remaining can be allocated, what memory is still outstanding, and possibly even help us to manage load imbalances by inspecting the internal buffers when choosing whether we need to run upcxx::progress() more or less at certain stages.

Official response

  • Dan Bonachea

    This was discussed in today's Pagoda meeting and a joint session with the HipMer team.

    Until now this issue has been "stalled" on reaching consensus for the semantics of providing a rich (probably dictionary-like) interface for insight into the shared heap state, which would provide programmatic access to the same "deep" insights currently only available via the upcxx::bad_shared_alloc::what() exception message (usually post-mortem). The crux of the semantic difficulty is providing enough information to meet client needs, without exposing too many details of the internal implementation that may be subject to change.

    In discussion today the HipMer team indicated that for their purposes a simple query that allowed them to compute the total available shared heap memory would be sufficient to address one of their most important current problems. Deploying this ASAP should help address their immediate concerns with the lack of backpressure in the RPC rendezvous algorithm (issue #242), while we work on longer-term, more general solutions to that problem.

    So here is the sketch of the proposed API I'm pursuing in the near-term:

    // returns total size of the host shared segment in the calling process
    int64_t upcxx::shared_segment_size(); 
    
    // returns snapshot of current total shared heap usage in this process (including user and runtime allocations)
    int64_t upcxx::shared_segment_used(); 
    

    Caveats (to be fleshed out later):

    • Both will be specified to allow a negative return value to indicate a "non-response", for potential future implementations where the query might be unavailable or meaningless (but the current implementation should always provide a non-negative response)
    • Both of these would return a "snapshot" value, that could be invalidated at some unspecified point in the future. Calls to the allocator by any thread obviously invalidate the result, but even in the current implementation less obvious things like AM handler execution (possibly on a hidden thread) could cause shared heap utilization to change asynchronously (but this should NOT affect HipMer's use case as I understand it).
    • In the current implementation the total shared segment size is constant over a run and the query will always return the same value to the same process in a given run, but this property won't be guaranteed; allowing for future implementations where it might change over time.

    We'll probably eventually add similar queries to device_allocator, but that's not a high priority so may happen later.

    CC: @Steven Hofmeyr

Comments (8)

  1. Dan Bonachea

    Point of clarification: the code Rob mentions only produces that output starting in the (forthcoming) 2020.3.2 release (or develop).

    It's probably too late to design and inject this into our forthcoming 2020.3.2 release, but this is a good idea and I think we can definitely provide something by the Sept release.

    I think the only tricky bit is we probably wouldn't want to promise anything about the details of what's tracked for the shared heap in the specification, especially since those details might change in future releases. So this could be an "implementation-defined" extension that returns information in a format like Rob suggests, whose type is subject to change/breakage without notice in subsequent releases.

    However I'd rather we design a more general key/value-like query interface to fetch self-describing information about whatever statistics we have.

    Example:

    std::vector<std::pair<std::string, size_t>> upcxx::query_my_sheap_status();
    

    where the return might look something like this:

    { 
      { "Shared heap size", 134217728 },  // this rank's total shared heap size in bytes
      { "Live user object count", 14 },   // shared objects currently allocated by client on this rank
      { "Live user object size", 4096 }, // their total size, in bytes, including allocator padding
      { "Live rdzv buffer count", 2 },   // same for rendezvous buffers
      { "Live rdzv buffer size", 128 },   
      { "Live misc buffer count", 1 },   // same for misc buffers
      { "Live misc buffer size", 1024 },   
    }
    

    Thoughts?

    CC: @Amir Kamil

  2. Paul Hargrove

    I like the idea. I definitely want (at least) the first release that includes this functionality to disclaim "stability" of the output. Toward that end, I think the "key/value-like query interface to fetch self-describing information" is worth consideration.

  3. john bachan

    The keys of the key/val store are not convenient for machine parsing, so why bother making their corresponding values machine parseable? Can someone make an argument that the key/value store is more useful than a single blob string?

  4. Dan Bonachea

    Is there any metric we can query on that target process to estimate how much (temporary) private memory upcxx / gasnet has been consumed (possibly even just those buffers which will be freed (eventually) by user progress())?

    @Rob Egan : As mentioned in the call, I see private memory utilization of the runtime as orthogonal to this particular issue. That topic seems worthy of additional design discussions, but I'd like to pursue that open issue separately. So I've moved your comment to new issue #444.

  5. Log in to comment