Support ALCF's PrgEnv-llvm

Issue #97 resolved
Paul Hargrove created an issue

ALCF has a (site-specific) PrgEnv-llvm (clang) for the Cray XC.
It is available on ALCF's Theta.

We should determine if it works, and consider adding to our "approved" list if it does.

This is distinct from a PrgEnv-llvm on NERSC's Cori, which is not intended for the XC/Aries nodes.

Comments (21)

  1. Paul Hargrove reporter

    Clarification:

    PrgEnv-llvm is not provided by Cray.
    It is site-specific environment module specific to ALCF.
    It is essentially PrgEnv-gnu w/ clang and clang++ substituted for gcc and g++.

  2. Paul Hargrove reporter

    I can run the GASNet-1 tests and Berkeley UPC suite with no problems with ALCF's PrgEnv-llvm.
    However, GASNet-EX + UPCXX is getting SIGILL on multiple tests from a simple "./run-tests" in an interactive job.

    Next I need to investigate whether GASNet-EX is failing its own tests, having only tested GASNet-1 in depth so far.

  3. Paul Hargrove reporter

    I have confirmed that the gasnet-tests suite passes fine with GASNet-EX and PrgEnv-llvm.
    However, the following upc++ tests are failing with a SIGILL:

    hello
    future
    multifile
    uts_threads
    lpc_barrier
    uts_omp
    uts_omp_ranks
    

    I only have a backtrack from uts_omp_ranks, and it is truncated.
    I will follow up with a complete report for that as time allows.

  4. Paul Hargrove reporter

    As of Jan 4, 2018 I still see SIGILL from some UPC++ tests when using PrgEnv-llvm.
    It is my opinion that attempting to diagnose and correct these failures (which could be clang++ bugs) for the March release is not a good use of our limited time.

  5. Paul Hargrove reporter

    This lis very likely a dup of #49 since the remaining failures match those reported in #49 and #92.
    So, I will plan to retest when that one has been resolved.

  6. Paul Hargrove reporter

    I seem to be getting SIGILL from the same tests (at least hello, future and multifile; I stopped after that).

  7. Paul Hargrove reporter

    IMHO: getting PrgEnv-llvm to work most logically belongs as part of the end-of-FY19 milestone for expanded compiler support (assuming it is possible at all).

  8. Paul Hargrove reporter

    The tests reported in this issue as getting SIGILL are precisely the ones impacted by issue #165.
    So, it is quite likely that there was never really any PrgEnv-llvm problem.

    However, it appears that PrgEnv-llvm on Theta is in a near-unusable state at the moment.
    In the link stage it spews some huge list of every object file in every library searched (even ones not linked).

    If I ignore the CI telling me that every compile failed (since several of the junk lines printed have "error" in them), then it appears that all but two of our tests compile, link and run. However, I cannot attest to the presence or absence of compiler warnings due to the "noise".

    The "but two" are crashes in uts_omp-par and uts_omp_ranks-par.
    However, for all I know that is due to an OpenMP problem.
    I am not attempting to report the details here, since I expect there will be non-trivial changes in our implementation by the time we reconsider supporting this compiler.

    On balance, I think this is definitely worth some consideration in the FY19 milestone for increased compiler support.

  9. Paul Hargrove reporter

    However, it appears that PrgEnv-llvm on Theta is in a near-unusable state at the moment.
    In the link stage it spews some huge list of every object file in every library searched (even ones not linked).

    This can be resolved by module unload xalt
    With that in place, I find PrgEnv-llvm (clang-5.0) to be working reasonably well.

    The crashes (mentioned above) in uts_omp-par and uts_omp_ranks-par are the only UPC++ failures seen in testing last night.

  10. Dan Bonachea

    FWIW, cori now has PrgEnv-llvm modules:

    {cori-knl ~} module avail PrgEnv-llvm
    
    ------------ /usr/common/software/modulefiles -------------------
    PrgEnv-llvm/9.0.0-git-patched-upstream_20190305 PrgEnv-llvm/9.0.0-git_20190220_cuda_10.1
    PrgEnv-llvm/9.0.0-git_20190220
    
  11. Paul Hargrove reporter

    FWIW, cori now has PrgEnv-llvm modules:

    But I found last week that Cori lacks a fully-functional llvm!
    Specifically, I found that clang++ is configured to use llvm's C++ library (not the one from g++), which is either missing or not installed correctly.

  12. Paul Hargrove reporter

    Following up on my previous comment regarding the llvm and PrgEnv-llvm modules on Cori:

    Unlike my initial attempts, I can at least now build a Hello World for the front-end using clang and clang++ from the llvm modules. If nothing else, this indicates that things are still changing.

    Cori's PrgEnv-llvm module is apparently NOT intended for the Aries nodes.
    Most notably, it does not conflict (in the modules system) with PrgEnv-{intel,gnu,cray} and when loaded Cray's cc and CC continue to use the Intel, GNU or Cray compilers.
    As indicated by its use of a CUDA-enabled MVAPICH (InfiniBand-only MPI) rather than Cray MPI, this is apparently intended for the GPU nodes. Specifically, the module loads the following modules (in an indirect way such that modules show PrgEnv-llvm does not list them):

     `gcc/7.3.0 cuda/[X] mvapich2/2.3 llvm/9.0.0-[Y]`
    

    where the X and Y depend on the specific PrgEnv-llvm/* module.

    I will shortly update the issue title and description to clarify that the target is "ALCF's PrgEnv-llvm".
    FWIW: I am making progress on ALCF's Theta.

  13. Paul Hargrove reporter

    The short version: "It works!".

    The OMP tests were the last piece in doubt, and they work (once one gets through the contortions necessary to get them to link). Use of OpenMP requires dynamic linking (not the default on Cray), which we believe is also an effective fix for issue 157 and issue 171.

    This issue is now a "Documentation" task, still assigned to me.
    I will (as time allows) generate a PR, in which we can discuss further details (such as if/where to document the extra bits for OMP compatibility).

    I am also planning to contact ALCF support about the OMP "contortions" to see if they can be reduced or eliminated. Though I expect dynamic linking is a requirement, we should not need to provide an RPATH for libffi.so and libomp.so.

  14. Paul Hargrove reporter

    Current plan is to document support for "ALCF's PrgEnv-llvm, version 4.0 and higher", and to state that OpenMP is not supported in this configuration.

    If included, the OpenMP instructions would be different for each version (4.0, 5.0 and 8.0 are installed currently), and even if we picked just one, the instructions are a mess. This OpenMP problem has nothing to do with UPC++, fwiw. I need something like 3 to 6 extra command line options just to properly link an OpenMP "Hello, World" example.

  15. Paul Hargrove reporter

    Support ALCF's PrgEnv-llvm

    This commit updates documentation and system-checks to officially list ALCF's PrgEnv-llvm as supported. This support starts at clang/4.0, which is (conveniently enough) the same as the floor we have already established for clang on x86_64.

    Resolves issue #97

    → <<cset c0ca03ef8c78>>

  16. Paul Hargrove reporter

    PrgEnv-llvm is now (finally) subjected to automated (once per-week) compile and run on Theta.
    We are not attempting to test a "floor", just the current installed version.

  17. Log in to comment