Use gasnet_getenv instead of std::getenv
src/os_env.hpp currently uses std::getenv
to read the environment. This is non-portable and on some distributed-memory spawners will fail to retrieve variables set on the spawning console. This is why gasnet provides gasnet_getenv
and UPC++ should be using it:
char * gasnet_getenv (const char *name)
Has the same semantics as the POSIX getenv() call, except it queries the system-specific environment which
was used to spawn the job (e.g. the environment of the spawning console). Calling POSIX getenv() directly
on some implementations may not correctly return values reflecting the environment that initiated the job
spawn, consequently GASNet clients wishing to query a consistent snapshot of the spawning environment
across nodes should never call getenv() directly. The semantics of POSIX setenv() are undefined in
GASNet jobs (specifically, it will probably fail to propagate changes across nodes).
GASNet-tools (see 'Environment utilities' section) also offers wrappers that parse the results of gasnet_getenv
into booleans, ints, floats and memory size values. Most importantly, these also report their activity to the console when GASNET_VERBOSEENV=1 is set, providing self-documenting environment variables.
Comments (10)
-
-
Calling getenv non-portable seems a bit dramatic. If that's the case, how do other hpc libraries read from the environment?
The mpirun or equivalent will handle propagation of the environment for MPI.
Other HPC environments will either take the approach of MPI or of capturing getenv calls.And doesn't upcxx-run alleviate much of the environment handling deficiencies of spawners?
No. Upcxx-run is a very thin wrapper around GASNet's spawners.
GASNet's spawners do resolve the problem - by requiring use of gasnett_getenv() to query the "global environment". -
reporter - changed milestone to 2018.03.31 release
-
assigned issue to
-
reporter - changed milestone to 2018.09.30 release
Mass roll-over of unresolved issues to the next milestone.
-
reporter -
assigned issue to
- marked as minor
This issue was triaged at the 2018-06-13 Pagoda meeting and assigned a new milestone/priority.
-
assigned issue to
-
reporter - changed milestone to 2019.03.31 release
Mass roll-over of unresolved issues to the next milestone.
-
reporter This issue has been discovered to critically break the handling of the UPCXX_SEGMENT_MB environment variable that controls the size of the UPC++ shared heap on several conduits in cluster configurations.
Here's a demonstration of the problem on dirac, using the nightly install (configured with OpenMPI 3.0 support):
$ which upcxx /usr/local/pkg/upcxx-dirac/gcc-8.2.0/nightly/bin/upcxx $ upcxx --version UPC++ version 20180905 upcxx-2018.9.5-6-gffbee05 / gex-2018.12.0-31-g6fdf0ec Copyright (c) 2018, The Regents of the University of California, through Lawrence Berkeley National Laboratory. http://upcxx.lbl.gov g++ (GCC) 8.2.0 Copyright (C) 2018 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ cat getenv.cpp #include <upcxx/upcxx.hpp> #include <gasnet.h> #include <stdlib.h> #include <iostream> int main(int argc, char **argv) { upcxx::init(); const char *key = "UPCXX_SEGMENT_MB"; if (argc > 1) key = argv[1]; const char *p = std::getenv(key); const char *g = gasnett_getenv(key); std::ostringstream oss; oss << upcxx::rank_me() << ":" << " std::getenv(" << key << ")=" << (p?p:"NULL") << " \t" << " gasnett_getenv(" << key << ")=" << (g?g:"NULL") << "\n"; std::cout << oss.str() << std::flush; upcxx::finalize(); return 0; } $ env UPCXX_GASNET_CONDUIT=ibv upcxx -o getenv-ibv getenv.cpp $ env UPCXX_GASNET_CONDUIT=udp upcxx -o getenv-udp getenv.cpp $ env UPCXX_GASNET_CONDUIT=mpi upcxx -o getenv-mpi getenv.cpp $ upcxx-run -shared-heap=512M -np 4 getenv-udp 2: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 0: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 1: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 3: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 $ upcxx-run -shared-heap=512M -np 4 getenv-mpi 0: std::getenv(UPCXX_SEGMENT_MB)=512 gasnett_getenv(UPCXX_SEGMENT_MB)=512 1: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 3: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 2: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 $ upcxx-run -shared-heap=512M -np 4 getenv-ibv 0: std::getenv(UPCXX_SEGMENT_MB)=512 gasnett_getenv(UPCXX_SEGMENT_MB)=512 2: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 1: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 3: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 $ env UPCXX_SEGMENT_MB=512 gasnetrun_ibv -np 4 -spawner=ssh getenv-ibv 0: std::getenv(UPCXX_SEGMENT_MB)=512 gasnett_getenv(UPCXX_SEGMENT_MB)=512 2: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 1: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 3: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 $ env UPCXX_SEGMENT_MB=512 gasnetrun_ibv -np 4 -spawner=mpi getenv-ibv 0: std::getenv(UPCXX_SEGMENT_MB)=512 gasnett_getenv(UPCXX_SEGMENT_MB)=512 3: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 2: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512 1: std::getenv(UPCXX_SEGMENT_MB)=NULL gasnett_getenv(UPCXX_SEGMENT_MB)=512
As shown above, the UPCXX_SEGMENT_MB environment variable set by the spawning console is not propagated to POSIX environment of the remotely spawned processes. The UPC++ runtime init currently fails to consult the GASNet master environment, so as a result the
upcxx-run -shared-heap
option is effectively broken for at least udp-conduit (for all but localhost spawning), ibv-conduit (with ssh-spawner or mpi-spawner w/OpenMPI) and mpi-conduit (with at least OpenMPI). In all these cluster configurations (and possibly others), all processes (except possibly the first) will ignore theupcxx-run -shared-heap
argument and use the default 128 MB UPC++ shared heap.The same problem also affects other envvar knobs to the UPC++ runtime and application codes (notably including the UPCXX_VERBOSE var).
The recommended workaround for the current release is to explicitly pass the environment variable using a /usr/bin/env wrapper around the UPC++ program, eg:
upcxx-run -np 4 env UPCXX_SEGMENT_MB=512 getenv-ibv
-
reporter -
assigned issue to
- marked as blocker
This defect impedes real production use of UPC++ on clusters, so fixing this is a release blocker.
Proposed solution in pull request #61
-
assigned issue to
-
reporter - changed status to resolved
fix issue 100: Use gasnet_getenv() for os_env
upcxx::os_env now uses GASNet environment services for the gasnet backend. There are several important parts to this:
-
The actual environment queries use gasnett_getenv, ensuring the queried value consults the console "master" environment, which on many clusters is not fully propagated to the POSIX process env.
-
Environment queries are now properly reported in GASNET_VERBOSEENV output, providing self-documenting envvar defaulting with
upcxx-run -vv
-
Boolean envvar values are now parsed as in GASNet envvars. Specifically, this means that boolean values can be specified as 'Y|YES|y|yes|1' or 'N|n|NO|no|0'.
-
A new memory size query function enables UPC++ to accept memory size suffixes on values, ie (B|KB|MB|GB|TB).
→ <<cset 82cb26cc6ed3>>
-
reporter Merge pull request #61 into develop
- os_env: test/uts: tweak env query upcxx-run: shared heap envvar updates Upgrade UPC++ shared segment envvar handling fix issue 100: Use gasnet_getenv() for os_env
→ <<cset d6ac9d9f1c8b>>
- Log in to comment
Calling getenv non-portable seems a bit dramatic. If that's the case, how do other hpc libraries read from the environment? And doesn't upcxx-run alleviate much of the environment handling deficiencies of spawners?