update upcxx-run logic for GASNET_MAX_SEGSIZE

Issue #156 resolved
Dan Bonachea created an issue

The behavior of the GASNET_MAX_SEGSIZE envvar has changed in the current GASNet-EX development head in a way that affects upcxx-run's handling of this envvar. Here is the new documentation:

* GASNET_MAX_SEGSIZE - control the upper limit for FAST/LARGE segment size on most conduits
 This setting defaults to the value passed to configure: --with-max-segsize=<val>
 In FAST and LARGE segment configurations, GASNet probes each compute node at
 startup to determine an upper-limit on the available space for use in the
 GASNet segment (and some other large internal objects). This value provides
 one upper-limit to that probe, which also has the effect of limiting the
 space available for client segments (as reported by gasnet_getMaxLocalSegmentSize()).
 <val> has the following format: size_spec ( / opt_suffix )
 where 'size_spec' is either an absolute memory size: [0-9]+{KB,MB,GB}
 or a fraction of compute node physical memory: 0.85
 and 'opt_suffix' is one of the following: (or empty, which means "P")
  "P" : means the limit is per-process and EXCLUDES internal GASNet objects
  "H" : means the limit is host-wide and INCLUDES internal GASNet objects
 Examples:
  "0.85/H" : limit host-wide use at 85% of physical memory (this is also the default)
  "4GB/P"  : try to ensure 4GB per process of GASNet shared segment space
 The default behavior of this option has grown considerably smarter over time, so
 it's anticipated that most clients will never need to set this.

This new behavior will eventually be active in collaborator-snapshot and subsequently the September release, so before then upcxx-run will need to be adjusted accordingly. The good news is the new behavior should mean the handling is much simpler and less error-prone than the current messiness. The bad news is the current messiness breaks fatally with the new GASNet behavior, although I'll be committing a temporary stop-gap measure for that problem shortly.

The main property that is changing (which is also what broke the current code) is it's no longer always possible for upcxx-run to compute the active value of GASNET_MAX_SEGSIZE (whether provided by configure default or the enclosing user env) - because the specified value can depend on the post-spawn process layout and compute-node physical memory size, neither of which is reliably available to the upcxx-run script running pre-spawn on the frontend. Because the active value is unknowable to the script in general, the script cannot always decide whether the value is large enough to accommodate the heap requested using upcxx -shared-heap.

There are several possible solutions:

  • OPTION 1 NEVER SET: Remove all logic related to GASNET_MAX_SEGSIZE from upcxx-run.

    • PRO: simplicity. We anticipate that in most cases the new (very large) configure default should be sufficient for normal use cases, and users with abnormal use cases can be read the GEX docs to learn how to manually set GASNET_MAX_SEGSIZE appropriately for their situation.
    • CON: If a naive user or sysadmin explicitly sets GASNET_MAX_SEGSIZE incorrectly, job launch will fail and upcxx-run provides no help.
  • OPTION 2 ALWAYS SET: Change upcxx-run to unconditionally set GASNET_MAX_SEGSIZE="<heapsz>MB/P"

    • PRO: simplicity
    • PRO: ensures UPC++ job launch always succeeds and gets a sufficient shared heap.
    • CON: prevents the user from requesting a larger GEX max segsize, as might eventually be desired in hybrid applications once GEX grows multi-client support (not currently a concern and likely to remain at best a corner-case usage)
  • OPTION 3 CONDITIONALLY SET: Change upcxx-run to detect the "knowable" cases where the active GASNET_MAX_SEGSIZE value is "too low" to accommodate the shared heap and raise it with a warning, otherwise trust it to be correct.

    • PRO: Closest to the current behavior wrt any currently supported GASNET_MAX_SEGSIZE values: Allows a corner-case user to "raise" the SEGSIZE, but still detects and fixes inputs that are decidably too low.
    • CON: Complexity
    • CON: If the value is undecidably too low, the user still needs to manually fix the setting.

Pseudo-code for option 3 implementing upcxx-run -shared-heap SZ:

 if !env{GASNET_MAX_SEGSIZE} then -- no setting in user env
   env{GASNET_MAX_SEGSIZE} = SZ."mb"
 else -- a setting exists in user env
   setting = env{GASNET_MAX_SEGSIZE}
   val = parse_memsize_dbl(setting) -- needs to handle inputs like: %f{,k,m,g,t}b?(/?[ph])?
                                    -- where %f is any strtod() float or a fraction like "1/8"
                                    -- case-insensitive, w/arbitrary whitespace between elements
   if val > 1 && val < SZ*2^20 && !(setting ~= /[hH]/) then
      print WARNING: GASNET_MAX_SEGSIZE too low, raising it
      env{GASNET_MAX_SEGSIZE} = SZ."mb"
   else
      -- must trust the user envvar setting is correct
   endif
 endif

It's unclear what the best answer is here. FWIW in the upcoming Berkeley UPC release, we anticipate upcrun will use a nasty mismash of options 2 & 3 (mostly due to historical inertia and deadline pressure), but will probably eventually land on option 2.

Comments (13)

  1. Dan Bonachea reporter

    commit d385435 works-around the failure that is breaking nightly CI, as a temporary hack until a proper solution can be deployed.

  2. Dan Bonachea reporter

    On a related note, in a recent meeting Mathias requested a closely related feature - the ability to specify the shared heap size as a fraction of host memory, eg upcxx-run -shared-heap=50%.

    If we adopt option 2 (which Paul and Dan currently prefer), then Mathias's feature request falls out very easily - for this example, upcxx-run just sets GASNET_MAX_SEGSIZE=0.5h and then sets UPCXX_SEGMENT_MB=MAX, or some other new token that (with minor changes) instructs the UPC++ initialization code to use the maximum available segment space (gasnet_getMaxLocalSegmentSize()) for the shared heap. This will result in the UPC++ processes creating shared heaps such that shared heap consumes 50% of the physical memory of each host, equally divided amongst co-located processes.

  3. Mathias Jacquelin

    Just so we are clear, I like option 2 as well. I am modifying the upcxx-run script accordingly.

  4. Dan Bonachea reporter

    Just so we are clear, I like option 2 as well.

    Agreed.

    If you have time I think the additional enhancement you requested to support upcxx-run -shared-heap=50% (explained in my comment above) would be nice, although it would require a minimal change to the UPC++ runtime startup code.

  5. Mathias Jacquelin

    I am almost done with the python script. I have one outstanding question however: if the user does not provide the -shared-heap option, but defines a GASNET_MAX_SEGSIZE environment variable, I don't think it is reasonable to overwrite it with 128MB/P (default value for UPCXX_SEGMENT_MB). In this particular case, I think it may be better to leave the GASNET_MAX_SEGSIZE unchanged. What do you think ?

  6. Mathias Jacquelin

    If -shared-heap is not used, and I want to implement the behavior described in my previous comment, do I have to make sure that GASNET_MAX_SEGSIZE is big enough or is it OK to consider this to be an "expert" feature ?

  7. Dan Bonachea reporter

    @mjacquelin

    if the user does not provide the -shared-heap option, but defines a GASNET_MAX_SEGSIZE environment variable, I don't think it is reasonable to overwrite it with 128MB/P (default value for UPCXX_SEGMENT_MB). In this particular case, I think it may be better to leave the GASNET_MAX_SEGSIZE unchanged

    I agree in that specific case the script should probably assume the user knows what he is doing at leave it alone.

    However the -shared-heap option on the command line should definitely take priority over the $GASNET_MAX_SEGSIZE envvar (which may be set in the default environment by a sysadmin).

    Definitely don't try to parse the $GASNET_MAX_SEGSIZE envvar to "make sure that GASNET_MAX_SEGSIZE is big enough" - the script lacks sufficient information to make that determination in general.

  8. Mathias Jacquelin

    Correction, the script is not doing what I describe. Should we even try to read the segment size define in the binary at all ?

  9. Dan Bonachea reporter

    Should we even try to read the segment size define in the binary at all ?

    No, the script should stop trying to read the segment size formula from the binary.

    As described in the original post, the grammar for GASNET_MAX_SEGSIZE values (both in envvar and binary embed) has changed in a way such that the script (running before the job has spawned) can no longer reliably map the setting to a number of bytes.

  10. Log in to comment