jac3d link failure with Clang on Summit

Issue #231 resolved
Paul Hargrove created an issue

I have been running tests on Summit recently and see the link failure below from the CI build of jac3d only with the clang compilers, and not with GNU compilers on the same system. This occurs in all 4 CI builds (seq vs par X -g vs -O3).

This was observed via CI, and thus the necessary/sufficient conditions were less than obvious. In particular, getting Clang to use std:: from a "modern" g++ (not /usr/bin/gcc) takes a bit of extra work. However, I found the following is sufficient to reproduce manually:

$ module load cuda llvm
$ cd [your upcxx directory]
$ export UPCXX_INSTALL=[your choice]
$ env UPCXX_CUDA=1 CC="clang" CXX="clang++ --gcc-toolchain=/sw/summit/gcc/8.1.1 -std=c++14" ./install $UPCXX_INSTALL
[this is where I get a cup of coffee]
$ cd [your upcxx-extras directory]/examples/jac3d
$ make clean
$ make
Makefile:17: CUDA_HOME environment variable is not set, assuming nvcc is in the PATH
/ccs/home/hargrove/upcxx-inst-clang/bin/upcxx -O2  -c jac3d.cpp
/ccs/home/hargrove/upcxx-inst-clang/bin/upcxx -O2  -c cmdLine.cpp
nvcc -m64 -c --compiler-options -fno-strict-aliasing  -O2  -arch=sm_30 -c jac3d_kernel.cu
/ccs/home/hargrove/upcxx-inst-clang/bin/upcxx -O2  -c meshFunctions.cpp
nvcc -m64 -c --compiler-options -fno-strict-aliasing  -O2  -arch=sm_30 -c serialize.cu
nvcc -m64 -c --compiler-options -fno-strict-aliasing  -O2  -arch=sm_30 -c utils.cu
nvcc -m64 -c --compiler-options -fno-strict-aliasing  -O2  -arch=sm_30 -c cUtil.cu
/ccs/home/hargrove/upcxx-inst-clang/bin/upcxx -O2  -c OrderlyOutput.cpp
/ccs/home/hargrove/upcxx-inst-clang/bin/upcxx -O2  -o jac3d jac3d.o cmdLine.o jac3d_kernel.o meshFunctions.o serialize.o utils.o cUtil.o OrderlyOutput.o
utils.o: In function `printMesh(int*, double*, int)':
tmpxft_0000517b_00000000-5_utils.cudafe1.cpp:(.text+0x72c): undefined reference to `OrderlyOutput(std::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&)'
/usr/bin/ld: link errors found, deleting executable `jac3d'
/usr/bin/sha1sum: jac3d: No such file or directory
clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [jac3d] Error 1

Please note that additional GASNet configure arguments (such as MPIRUN_CMD and an RPATH via LDFLAGS) are required to get a build that can actually run on Summit. I've left these out since this issue manifests when compiling the example.

Comments (6)

  1. Max Grossman

    Possible factor: nvcc uses host GNU compiler, not clang? unfortunately, nvcc is incompatible with the newest clang installed on summit (8.0.0) and upc++ is incompatible with the only other version (3.8.0).

  2. Max Grossman

    I was able to resolve this issue by adding the following command line argument to nvcc in the jac3d Makefile:

    -ccbin /sw/summit/gcc/8.1.1/bin/g++

    which instructs it to use the same GNU toolchain as was specified to clang in the upc++ ./install command.

    I’m not sure how to resolve this one. We can’t simply pass -ccbin $(upcxx-meta CXX) to nvcc because it can then complain about the clang version (and it will on summit). We could add guidance in the README (for jac3d and/or for UPC++ CUDA) that nvcc and upcxx must both be configured to use the same backend compiler toolchain. Other than that, I’m not sure what technical change to the build would prevent this issue from arising.

  3. Paul Hargrove reporter

    @Max Grossman Thanks for looking into this.

    I think "add guidance in the README" for UPC++ CUDA support is the right approach.
    While I only saw this with jac3d, it sounds like a general problem.

  4. Paul Hargrove reporter

    For completeness: I want to note that given the design of the Makefile, one can set NVCCFLAGS=-ccbin=/path/to/your/g++ to get the desired effect without needing to actually edit the Makefile.

  5. Log in to comment