Support ALCF's PrgEnv-llvm
ALCF has a (site-specific) PrgEnv-llvm (clang) for the Cray XC.
It is available on ALCF's Theta.
We should determine if it works, and consider adding to our "approved" list if it does.
This is distinct from a PrgEnv-llvm on NERSC's Cori, which is not intended for the XC/Aries nodes.
Comments (21)
-
reporter -
reporter I can run the GASNet-1 tests and Berkeley UPC suite with no problems with ALCF's PrgEnv-llvm.
However, GASNet-EX + UPCXX is getting SIGILL on multiple tests from a simple "./run-tests" in an interactive job.Next I need to investigate whether GASNet-EX is failing its own tests, having only tested GASNet-1 in depth so far.
-
reporter I have confirmed that the gasnet-tests suite passes fine with GASNet-EX and PrgEnv-llvm.
However, the following upc++ tests are failing with a SIGILL:hello future multifile uts_threads lpc_barrier uts_omp uts_omp_ranks
I only have a backtrack from uts_omp_ranks, and it is truncated.
I will follow up with a complete report for that as time allows. -
reporter As of Jan 4, 2018 I still see SIGILL from some UPC++ tests when using PrgEnv-llvm.
It is my opinion that attempting to diagnose and correct these failures (which could be clang++ bugs) for the March release is not a good use of our limited time. -
- removed milestone
-
reporter -
@PHHargrove we should re-check this at some point now that those issues are resolved.
-
reporter I seem to be getting SIGILL from the same tests (at least hello, future and multifile; I stopped after that).
-
reporter IMHO: getting PrgEnv-llvm to work most logically belongs as part of the end-of-FY19 milestone for expanded compiler support (assuming it is possible at all).
-
- changed milestone to 2019.09.30 release
- marked as critical
This issue was triaged at the 2018-06-13 Pagoda meeting and assigned a new milestone/priority.
This issue is directly relevant to the Sept 19 milestone, and we resolved that before then we need to either deploy a workaround or document the compiler version as blacklisted due to a bug.
-
reporter The tests reported in this issue as getting SIGILL are precisely the ones impacted by issue
#165.
So, it is quite likely that there was never really any PrgEnv-llvm problem.However, it appears that PrgEnv-llvm on Theta is in a near-unusable state at the moment.
In the link stage it spews some huge list of every object file in every library searched (even ones not linked).If I ignore the CI telling me that every compile failed (since several of the junk lines printed have "error" in them), then it appears that all but two of our tests compile, link and run. However, I cannot attest to the presence or absence of compiler warnings due to the "noise".
The "but two" are crashes in
uts_omp-par
anduts_omp_ranks-par
.
However, for all I know that is due to an OpenMP problem.
I am not attempting to report the details here, since I expect there will be non-trivial changes in our implementation by the time we reconsider supporting this compiler.On balance, I think this is definitely worth some consideration in the FY19 milestone for increased compiler support.
-
reporter However, it appears that PrgEnv-llvm on Theta is in a near-unusable state at the moment.
In the link stage it spews some huge list of every object file in every library searched (even ones not linked).This can be resolved by
module unload xalt
With that in place, I find PrgEnv-llvm (clang-5.0) to be working reasonably well.The crashes (mentioned above) in
uts_omp-par
anduts_omp_ranks-par
are the only UPC++ failures seen in testing last night. -
FWIW, cori now has PrgEnv-llvm modules:
{cori-knl ~} module avail PrgEnv-llvm ------------ /usr/common/software/modulefiles ------------------- PrgEnv-llvm/9.0.0-git-patched-upstream_20190305 PrgEnv-llvm/9.0.0-git_20190220_cuda_10.1 PrgEnv-llvm/9.0.0-git_20190220
-
reporter FWIW, cori now has PrgEnv-llvm modules:
But I found last week that Cori lacks a fully-functional llvm!
Specifically, I found that clang++ is configured to use llvm's C++ library (not the one from g++), which is either missing or not installed correctly. -
reporter Following up on my previous comment regarding the
llvm
andPrgEnv-llvm
modules on Cori:Unlike my initial attempts, I can at least now build a Hello World for the front-end using
clang
andclang++
from thellvm
modules. If nothing else, this indicates that things are still changing.Cori's
PrgEnv-llvm
module is apparently NOT intended for the Aries nodes.
Most notably, it does not conflict (in the modules system) withPrgEnv-{intel,gnu,cray}
and when loaded Cray'scc
andCC
continue to use the Intel, GNU or Cray compilers.
As indicated by its use of a CUDA-enabled MVAPICH (InfiniBand-only MPI) rather than Cray MPI, this is apparently intended for the GPU nodes. Specifically, the module loads the following modules (in an indirect way such thatmodules show PrgEnv-llvm
does not list them):`gcc/7.3.0 cuda/[X] mvapich2/2.3 llvm/9.0.0-[Y]`
where the
X
andY
depend on the specificPrgEnv-llvm/*
module.I will shortly update the issue title and description to clarify that the target is "ALCF's PrgEnv-llvm".
FWIW: I am making progress on ALCF's Theta. -
reporter - changed title to Validate ALCF's PrgEnv-llvm
- edited description
-
reporter - changed title to Support ALCF's PrgEnv-llvm
- changed component to Documentation
The short version: "It works!".
The OMP tests were the last piece in doubt, and they work (once one gets through the contortions necessary to get them to link). Use of OpenMP requires dynamic linking (not the default on Cray), which we believe is also an effective fix for issue 157 and issue 171.
This issue is now a "Documentation" task, still assigned to me.
I will (as time allows) generate a PR, in which we can discuss further details (such as if/where to document the extra bits for OMP compatibility).I am also planning to contact ALCF support about the OMP "contortions" to see if they can be reduced or eliminated. Though I expect dynamic linking is a requirement, we should not need to provide an RPATH for libffi.so and libomp.so.
-
reporter Current plan is to document support for "ALCF's PrgEnv-llvm, version 4.0 and higher", and to state that OpenMP is not supported in this configuration.
If included, the OpenMP instructions would be different for each version (4.0, 5.0 and 8.0 are installed currently), and even if we picked just one, the instructions are a mess. This OpenMP problem has nothing to do with UPC++, fwiw. I need something like 3 to 6 extra command line options just to properly link an OpenMP "Hello, World" example.
-
reporter Proposed resolution in PR#102
-
reporter - changed status to resolved
Support ALCF's PrgEnv-llvm
This commit updates documentation and system-checks to officially list ALCF's PrgEnv-llvm as supported. This support starts at clang/4.0, which is (conveniently enough) the same as the floor we have already established for clang on x86_64.
Resolves issue
#97→ <<cset c0ca03ef8c78>>
-
reporter PrgEnv-llvm is now (finally) subjected to automated (once per-week) compile and run on Theta.
We are not attempting to test a "floor", just the current installed version. - Log in to comment
Clarification:
PrgEnv-llvm is not provided by Cray.
It is site-specific environment module specific to ALCF.
It is essentially PrgEnv-gnu w/ clang and clang++ substituted for gcc and g++.