UPC++ shared heap init ignores GASNet max segsize
Currently if you run any UPC++ program where UPCXX_SEGMENT_MB exceeds GASNET_MAX_SEGSIZE, upcxx::init
drops dead with a confusing error:
env GASNET_MAX_SEGSIZE=64MB UPCXX_SEGMENT_MB=128 upcxx-run 1 ./compute-pi
GASNet gasnetc_attach returning an error code: GASNET_ERR_BAD_ARG (Invalid function parameter passed)
at /home/bonachea/UPC/upcxx/.nobs/art/c2a4735e613f203b77119673dcacfcb9951deb62/GASNet-EX-collaborator-snapshot/smp-conduit/gasnet_core.c:729
reason: segsize too large
UPC++ assertion failure on rank 0 [/home/bonachea/UPC/upcxx/src/backend/gasnet/runtime.cpp:134]
Abort (core dumped)
the problem is the runtime is failing to validate its selected heap size against gasnet_getMaxLocalSegmentSize()
, which provides the maximum allowable value for the segsz argument to gasnet_attach
in a given run (the value returned is determined only in part from GASNET_MAX_SEGSIZE, it also includes various conduit-specific knobs and current system resource state).
The UPC++ runtime init code should call gasnet_getMaxLocalSegmentSize()
before attach and use it to validate the segment size it intends to request - if UPCXX_SEGMENT_MB specified too much (explicitly or by default), the runtime should issue an more explanatory message and either round the value down or exit (the right behavior here is a policy question).
Comments (5)
-
-
reporter More importantly, the lowered value of
GASNET_MAX_SEGSIZE
is only a way to simulate this condition, which can also arise at runtime based on system resource state, as noted in the first comment:(the value returned is determined only in part from GASNET_MAX_SEGSIZE, it also includes various conduit-specific knobs and current system resource state).
So no amount of upcxx-run can entirely solve this - the check needed to be added in src/backend/gasnet/runtime.cpp, right where the comment mentions
gasnet_getMaxLocalSegmentSize
-
Agreed. I was just pointing out that the original example given with upcxx-run is no longer valid.
-
reporter -
assigned issue to
-
assigned issue to
-
- changed status to resolved
Now checks the gasnet_max_segsize and reduces the upcxx segment size accordingly, with a warning.
- Log in to comment
The exact example given above is no longer correct.
upcxx-run
automatically adjusts theGASNET_MAX_SEGSIZE
, e.g.:Runs fine with a warning:
However, the problem of course still exists. We see it with