App build with nvcc wrapper around configured compiler leads to errors (eg. __builtin_launder)

Issue #493 new
Colin MacLean created an issue

I am unable to build the Kokkos interop example on Summit with 2021.3.6-snapshot due to missing __builtin_launder support with nvcc. It is invoked with the Kokkos nvcc_wrapper, which may possibly have something to do with it. I’m unsure why the configure detected it as being available. I’m working around this problem at the moment by including upcxx_config.hpp and undefining UPCXX_HAVE___BUILTIN_LAUNDER.

Comments (13)

  1. Dan Bonachea

    UPC++ configure detection runs on the C++ compiler provided via CXX at configure time, which for the module installs on summit is the MPI wrapper around the C++ compiler (ie mpicxx wrapping g++ or pgc++).

    It sounds like the configure script correctly detected support for __builtin_launder in the provided C++ compiler, but the nvcc wrapper added at app build time (which also has to parse the headers) lacks the required support.

    We don't officially support this mode of operation, for exactly this reason (ie passing the installed UPC++ headers to any compiler other than the CXX provided to configure). The configure script detects properties of the CXX compiler it is given, but cannot detect anything about a compiler wrapper added later at app build time that imposes additional restrictions, affects ABI compatibility or might otherwise invalidate the configure results. The problem with __builtin_launder led to an obvious error in this case, but there might theoretically be other more subtle problems lurking as a result of this compiler mismatch. Because this conflict arose from what is currently considered "pilot error", I'm changing this issue from "bug" to "proposal".

    One possible solution is to configure and install UPC++ with CXX="nvcc -ccbin=mpicxx", which should fix the configure detection problem. However we do not officially support use of nvcc wrapper as the target C++ compiler, so there might be other problems with that approach. This might also suffer from problems due to recursion when nvcc calls upcxx calls nvcc.

    Another possibility is we could special-case configure in --with-cuda mode to additionally probe properties of nvcc (or more generally, the provided --with-nvcc), and adjust upcxx_config.hpp accordingly. This is somewhat "fragile" in that we'd need to manually curate the list of items that might diverge in behavior and handle each accordingly.

  2. Paul Hargrove

    @Dan Bonachea I believe you have stated above that our own UPC++/Kokkos interoperability example for the CUDA-enabled case is an embodiment of "pilot error". Have I misunderstood something?

  3. Dan Bonachea

    our own UPC++/Kokkos interoperability example for the CUDA-enabled case is an embodiment of "pilot error"

    Correct. The Makefile (notably not yet on master) for KOKKOS_DEVICES=Cuda includes:

    CXX = $(KOKKOS_PATH)/bin/nvcc_wrapper -ccbin $(UPCXX)
    

    which I believe technically constitutes "pilot error" for a UPC++ install built with CXX=mpicxx, because it means our installed headers will be parsed by a frontend other than the configured mpicxx (namely nvcc_wrapper). I'd kept silent about this when it didn't cause a problem, but now it apparently has.

  4. Colin MacLean reporter

    Kokkos interoperability being supported depends on nvcc being supported. There aren’t separate .cu files, as Kokkos is single source. Thus, the source is always going to be parsed with nvcc. Separating Kokkos code into different files from UPC++ code would be quite a pain.

    It looks like __builtin_launder support was added with CUDA 11, but UPC++ is built with CUDA 10 on Summit so it will compile but not run.

    We could check for __NVCC__ to handle some of this. Perhaps something like this:

    #define UPCXX_HAVE___BUILTIN_LAUNDER_CXX  /* configure generated value */
    #define UPCXX_HAVE___BUILTIN_LAUNDER_NVCC /* configure generated value */
    #ifdef __NVCC__
    #define UPCXX_HAVE___BUILTIN_LAUNDER UPCXX_HAVE___BUILTIN_LAUNDER_NVCC
    #else
    #define UPCXX_HAVE___BUILTIN_LAUNDER UPCXX_HAVE___BUILTIN_LAUNDER_CXX
    #endif
    

  5. Dan Bonachea

    We could check for __NVCC__ to handle some of this.

    Yep, that's exactly what I had in mind when I said "additionally probe properties of nvcc [..], and adjust upcxx_config.hpp accordingly". However as I also observed this inherently assumed we've correctly identified the only relevant way in which the behavior of the nvcc frontend parsing diverges from the underlying CXX frontend (which remains TBD).

  6. Paul Hargrove

    My initial thought when Colin reported the problem in Slack was that we could add something similar to UPCXX_USE_NODISCARD to allow the user to override our "probe" logic (regardless of whether it resides in the configure step or an as-shipped header).

    One alternate to the UPCXX_USE_* approach of introducing a new identifier would be to adjust the generated upcxx_config.h to honor any existing setting for this (and other?) probe results, by wrapping generated definitions with #ifndef:

    #ifndef UPCXX_HAVE___BUILTIN_LAUNDER

    I am not opposed to improving also the "out of box" experience when using nvcc, but like the idea of also adding a user-facing "opt-out" if for no other reason than a vague concern that AMD and Intel GPU support may have similar issues.

    Thinking in that general direction makes me think that UPCXX_HAVE___BUILTIN_ASSUME_ALIGNED would be another candidate for opt-out when using a non-configured compiler.

    In GASNet-EX there is precedent for detecting when using a non-configured compiler and discarding some probed compiler properties in favor of conservative assumptions. Additionally there is a precedent of using preprocessor identifiers to override probes. For instance GASNETT_USE_GCC_ATTRIBUTE_WARNUNUSEDRESULT can be defined to 1 or 0 to control use of __attrtibute__((__warn_unused_result__)). I think that UPCXX_USE_NODISCARD is already a move in that direction.

    @Dan Bonachea Thoughts on extending "compiler mismatch" and/or the GASNETT_USE_* approach to the (currently few) compiler properties used conditionally by UPC++? As I said above, I don't think doing so would need to preclude adding better logic for any specific cases we care about (e.g. something like Colin's proposed UPCXX_HAVE___BUILTIN_LAUNDER_NVCC in this case).

  7. Paul Hargrove

    Dan and I discussed my suggestion, above, that UPC++ might embrace some of the precedents we'd developed in GASNet-EX. However, Dan rightly observed that UPC++ has knowledge of the nvcc (and its arguments) at configure time and can therefore perform the necessary probes. Additionally, Dan reminded me that somethings like launder don't have "safe default" implementations to fall back on in the "compiler mismatch" case. These mean we are looking to address a different problem space than the GASNet-EX solutions were built for.

    We sketched a design for work which would generalize our existing logic used to generate upcxx_config.hpp to have it probe multiple compilers (CXX, nvcc, hipcc, ...) while keeping the individual probes mostly unchanged and providing for reasonable maintainability as compilers and probes are added (common case is O(1) work to add either).

    So I withdraw my suggestions in the previous comment, and instead endorse the proposal to "additionally probe properties of nvcc [..], and adjust upcxx_config.hpp accordingly".

  8. Log in to comment