Use gasnet_getenv instead of std::getenv

Issue #100 resolved
Dan Bonachea created an issue

src/os_env.hpp currently uses std::getenv to read the environment. This is non-portable and on some distributed-memory spawners will fail to retrieve variables set on the spawning console. This is why gasnet provides gasnet_getenv and UPC++ should be using it:

char * gasnet_getenv (const char *name)
Has the same semantics as the POSIX getenv() call, except it queries the system-specific environment which
was used to spawn the job (e.g. the environment of the spawning console). Calling POSIX getenv() directly
on some implementations may not correctly return values reflecting the environment that initiated the job
spawn, consequently GASNet clients wishing to query a consistent snapshot of the spawning environment
across nodes should never call getenv() directly. The semantics of POSIX setenv() are undefined in
GASNet jobs (specifically, it will probably fail to propagate changes across nodes).

GASNet-tools (see 'Environment utilities' section) also offers wrappers that parse the results of gasnet_getenv into booleans, ints, floats and memory size values. Most importantly, these also report their activity to the console when GASNET_VERBOSEENV=1 is set, providing self-documenting environment variables.

Comments (10)

  1. john bachan

    Calling getenv non-portable seems a bit dramatic. If that's the case, how do other hpc libraries read from the environment? And doesn't upcxx-run alleviate much of the environment handling deficiencies of spawners?

  2. Paul Hargrove

    Calling getenv non-portable seems a bit dramatic. If that's the case, how do other hpc libraries read from the environment?

    The mpirun or equivalent will handle propagation of the environment for MPI.
    Other HPC environments will either take the approach of MPI or of capturing getenv calls.

    And doesn't upcxx-run alleviate much of the environment handling deficiencies of spawners?

    No. Upcxx-run is a very thin wrapper around GASNet's spawners.
    GASNet's spawners do resolve the problem - by requiring use of gasnett_getenv() to query the "global environment".

  3. Dan Bonachea reporter

    This issue has been discovered to critically break the handling of the UPCXX_SEGMENT_MB environment variable that controls the size of the UPC++ shared heap on several conduits in cluster configurations.

    Here's a demonstration of the problem on dirac, using the nightly install (configured with OpenMPI 3.0 support):

    $ which upcxx
    /usr/local/pkg/upcxx-dirac/gcc-8.2.0/nightly/bin/upcxx
    $ upcxx --version
    UPC++ version 20180905 upcxx-2018.9.5-6-gffbee05 / gex-2018.12.0-31-g6fdf0ec
    Copyright (c) 2018, The Regents of the University of California,
    through Lawrence Berkeley National Laboratory.
    http://upcxx.lbl.gov
    
    g++ (GCC) 8.2.0
    Copyright (C) 2018 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    $ cat getenv.cpp                                             
    #include <upcxx/upcxx.hpp>
    #include <gasnet.h>
    #include <stdlib.h>
    #include <iostream>
    
    int main(int argc, char **argv) {
      upcxx::init();
      const char *key = "UPCXX_SEGMENT_MB";
      if (argc > 1)  key = argv[1];
    
      const char *p =  std::getenv(key);
      const char *g =  gasnett_getenv(key);
      std::ostringstream oss;
      oss << upcxx::rank_me() << ":" 
          << " std::getenv(" << key << ")=" << (p?p:"NULL") << " \t"
          << " gasnett_getenv(" << key << ")=" << (g?g:"NULL")
          << "\n";
      std::cout << oss.str() << std::flush;
    
      upcxx::finalize();
      return 0;
    }
    
    $ env UPCXX_GASNET_CONDUIT=ibv upcxx -o getenv-ibv getenv.cpp
    $ env UPCXX_GASNET_CONDUIT=udp upcxx -o getenv-udp getenv.cpp   
    $ env UPCXX_GASNET_CONDUIT=mpi upcxx -o getenv-mpi getenv.cpp 
    
    $ upcxx-run -shared-heap=512M -np 4 getenv-udp                              
    2: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    0: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    1: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    3: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    
    $ upcxx-run -shared-heap=512M -np 4 getenv-mpi   
    0: std::getenv(UPCXX_SEGMENT_MB)=512     gasnett_getenv(UPCXX_SEGMENT_MB)=512
    1: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    3: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    2: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    
    $ upcxx-run -shared-heap=512M -np 4 getenv-ibv
    0: std::getenv(UPCXX_SEGMENT_MB)=512     gasnett_getenv(UPCXX_SEGMENT_MB)=512
    2: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    1: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    3: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    
    $ env UPCXX_SEGMENT_MB=512 gasnetrun_ibv -np 4 -spawner=ssh getenv-ibv                              
    0: std::getenv(UPCXX_SEGMENT_MB)=512     gasnett_getenv(UPCXX_SEGMENT_MB)=512
    2: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    1: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    3: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    $ env UPCXX_SEGMENT_MB=512 gasnetrun_ibv -np 4 -spawner=mpi getenv-ibv   
    0: std::getenv(UPCXX_SEGMENT_MB)=512     gasnett_getenv(UPCXX_SEGMENT_MB)=512
    3: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    2: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    1: std::getenv(UPCXX_SEGMENT_MB)=NULL    gasnett_getenv(UPCXX_SEGMENT_MB)=512
    

    As shown above, the UPCXX_SEGMENT_MB environment variable set by the spawning console is not propagated to POSIX environment of the remotely spawned processes. The UPC++ runtime init currently fails to consult the GASNet master environment, so as a result the upcxx-run -shared-heap option is effectively broken for at least udp-conduit (for all but localhost spawning), ibv-conduit (with ssh-spawner or mpi-spawner w/OpenMPI) and mpi-conduit (with at least OpenMPI). In all these cluster configurations (and possibly others), all processes (except possibly the first) will ignore the upcxx-run -shared-heap argument and use the default 128 MB UPC++ shared heap.

    The same problem also affects other envvar knobs to the UPC++ runtime and application codes (notably including the UPCXX_VERBOSE var).

    The recommended workaround for the current release is to explicitly pass the environment variable using a /usr/bin/env wrapper around the UPC++ program, eg:

    upcxx-run -np 4 env UPCXX_SEGMENT_MB=512 getenv-ibv
    
  4. Dan Bonachea reporter

    fix issue 100: Use gasnet_getenv() for os_env

    upcxx::os_env now uses GASNet environment services for the gasnet backend. There are several important parts to this:

    1. The actual environment queries use gasnett_getenv, ensuring the queried value consults the console "master" environment, which on many clusters is not fully propagated to the POSIX process env.

    2. Environment queries are now properly reported in GASNET_VERBOSEENV output, providing self-documenting envvar defaulting with upcxx-run -vv

    3. Boolean envvar values are now parsed as in GASNet envvars. Specifically, this means that boolean values can be specified as 'Y|YES|y|yes|1' or 'N|n|NO|no|0'.

    4. A new memory size query function enables UPC++ to accept memory size suffixes on values, ie (B|KB|MB|GB|TB).

    → <<cset 82cb26cc6ed3>>

  5. Log in to comment