- edited description
Multiple test failures with NVHPC 23.3+
Tests of NVHPC 23.3 on all three architectures they support (x86_64, ppc64le, aarch64) have failures which did not occur with their 23.1 release.
There are two failure modes:
- Various assertion failures which indicate corruption of a global pointer.
This has been seen from (at least)rput-cover
,vis
,allloc
,global_ptr
,non-contig-example
,allocator-example
andlocal_team
in harness-based testing. - Internal Compiler Errors.
This has been seen with at leasttest/memory_kinds.cpp
, both with and without--with-cuda
.
Full harness output for x86_64, ppc6le and aarch64 (in that order) can be found in the following three locations:
- https://gasnet-bugs.lbl.gov/upc_tests/test_logs/MISC/2023-04-03/jlse-skylake.EX-skylake-ibv-nvidia_new/19:53:56
- https://gasnet-bugs.lbl.gov/upc_tests/test_logs/MISC/2023-04-04/jlse-firestone.EX-firestone-smp-nvidia_new/05:07:18
- https://gasnet-bugs.lbl.gov/upc_tests/test_logs/MISC/2023-04-04/jlse-a64fx.EX-a64fx-ibv-nvhpc/21:23:26
So far, only codemode=debug
exhibits the failures, and not just because assertions are disabled for codemode=opt
. In particular:
upcxx -codemode=opt -DUPCXXI_GPTR_CHECK_ENABLED=1 ...
does NOT fail.upcxx -codemode=debug -O1 ..
does NOT fail.
Both failure modes appear attributable to a flawed implementation of __attribute__((pure))
, support for which is new in this release of NVHPC. Therefore, setting gasnet_cv_gasneti_have_cxx_attr_pure=no
at configure time (in the environment or on the command line) is believed to be an effective work-around (as is use of any earlier supported release of NVHPC).
Comments (6)
-
reporter -
reporter Correction:
The ICE compiling
test/memory_kinds.cpp
is reproducible in both codemodes (and both respond to the same work-around). -
reporter - changed status to open
GASNet-EX commit 588e47720 adjusts logic for use of the
pure
attribute to exclude NVHPC 23.3 and newer.This issue remains open pending the following:
- Advance of the GASNet-EX
stable
branch to include the work-around. - Identification of an eventual fixed release of NVHPC at which the work-around can be disabled.
-
reporter - changed status to resolved
I am marking this resolved and recording the remaining (non-upc++) tasks elsewhere.
-
reporter -
reporter - changed title to Multiple test failures with NVHPC 23.3+
For the record: the problem is still present in the current NVHPV 24.3 release
- Log in to comment