UPC++ shared heap init ignores GASNet max segsize

Issue #96 resolved
Dan Bonachea created an issue

Currently if you run any UPC++ program where UPCXX_SEGMENT_MB exceeds GASNET_MAX_SEGSIZE, upcxx::init drops dead with a confusing error:

env GASNET_MAX_SEGSIZE=64MB UPCXX_SEGMENT_MB=128 upcxx-run 1 ./compute-pi         
GASNet gasnetc_attach returning an error code: GASNET_ERR_BAD_ARG (Invalid function parameter passed)
  at /home/bonachea/UPC/upcxx/.nobs/art/c2a4735e613f203b77119673dcacfcb9951deb62/GASNet-EX-collaborator-snapshot/smp-conduit/gasnet_core.c:729
  reason: segsize too large
UPC++ assertion failure on rank 0 [/home/bonachea/UPC/upcxx/src/backend/gasnet/runtime.cpp:134]
Abort (core dumped)

the problem is the runtime is failing to validate its selected heap size against gasnet_getMaxLocalSegmentSize(), which provides the maximum allowable value for the segsz argument to gasnet_attach in a given run (the value returned is determined only in part from GASNET_MAX_SEGSIZE, it also includes various conduit-specific knobs and current system resource state).

The UPC++ runtime init code should call gasnet_getMaxLocalSegmentSize() before attach and use it to validate the segment size it intends to request - if UPCXX_SEGMENT_MB specified too much (explicitly or by default), the runtime should issue an more explanatory message and either round the value down or exit (the right behavior here is a policy question).

Comments (5)

  1. Steven Hofmeyr

    The exact example given above is no longer correct. upcxx-run automatically adjusts the GASNET_MAX_SEGSIZE, e.g.:

    UPCXX_SEGMENT_MB=128 GASNET_MAX_SEGSIZE=64M upcxx-run -n1 compute-pi
    

    Runs fine with a warning:

    WARNING: GASNET_MAX_SEGSIZE 67108864 is too low for a shared heap of 134217728; setting to 138412032
    

    However, the problem of course still exists. We see it with

    UPCXX_SEGMENT_MB=128 GASNET_MAX_SEGSIZE=64M GASNET_PSHM_NODES=1 ./compute-pi
    
  2. Dan Bonachea reporter

    More importantly, the lowered value of GASNET_MAX_SEGSIZE is only a way to simulate this condition, which can also arise at runtime based on system resource state, as noted in the first comment:

    (the value returned is determined only in part from GASNET_MAX_SEGSIZE, it also includes various conduit-specific knobs and current system resource state).

    So no amount of upcxx-run can entirely solve this - the check needed to be added in src/backend/gasnet/runtime.cpp, right where the comment mentions gasnet_getMaxLocalSegmentSize

  3. Steven Hofmeyr

    Agreed. I was just pointing out that the original example given with upcxx-run is no longer valid.

  4. Log in to comment