Test failures with oneAPI compiler versions 2022.2.0, 2022.2.1, and 2023.0.0

Paul Hargrove reporter

Update with today's progress.

First, a clarification/correction: While I characterized the results as "numerically incorrect", some might not agree that is the right way to describe what is occurring. The output from rb1d is the number of iterations to converge and the max error when converged. The results for the compilers listed show convergence at a different iteration count (and with a different error) than all others compilers tested (oneAPI or otherwise). I've taken this to indicate a numerically different (presumed "incorrect") solution.

The first output below demonstrates that with the most recent oneAPI compiler release on Dirac, the result is independent of the process count. However, the second shows one of the problematic compiler versions yields a different result for process counts 1, 2 and 4 (4 processes is where the "numerically incorrect results" were observed).

$ ./B-oneapi-2023.1.0/opt/upcxx/bin/upcxx -network=smp -O ~/upcxx/example/prog-guide/rb1d.cpp -o rb1d-23.1
$ for i in 1 2 4 8 16; do echo -n "NP=$i    "; env GASNET_PSHM_NODES=$i ./rb1d-23.1; done
NP=1    Converged at 5590, err 4.99825
NP=2    Converged at 5590, err 4.99825
NP=4    Converged at 5590, err 4.99825
NP=8    Converged at 5590, err 4.99825
NP=16    Converged at 5590, err 4.99825

$ ./B-oneapi-2023.0.0/opt/upcxx/bin/upcxx -network=smp -O ~/upcxx/example/prog-guide/rb1d.cpp -o rb1d-23.0
$ for i in 1 2 4 8 16; do echo -n "NP=$i    "; env GASNET_PSHM_NODES=$i ./rb1d-23.0; done
NP=1    Converged at 6800, err 4.99931
NP=2    Converged at 6340, err 4.99885
NP=4    Converged at 5030, err 4.99657
NP=8    Converged at 5590, err 4.99825
NP=16    Converged at 5590, err 4.99825

Demonstration that other compiler families (still on Dirac) do not have this behavior (where "intel" below is the "Classic" compilers, not oneAPI):

$ module load upcxx/2023.3.0
$ module load PrgEnv/nvidia
$ upcxx -network=smp -O ~/upcxx/example/prog-guide/rb1d.cpp
$ for i in 1 2 4 8; do echo -n "NP=$i    "; env GASNET_PSHM_NODES=$i ./a.out; done
NP=1    Converged at 5590, err 4.99825
NP=2    Converged at 5590, err 4.99825
NP=4    Converged at 5590, err 4.99825
NP=8    Converged at 5590, err 4.99825

$ module swap PrgEnv PrgEnv/gnu
$ upcxx -network=smp -O ~/upcxx/example/prog-guide/rb1d.cpp
$ for i in 1 2 4 8; do echo -n "NP=$i    "; env GASNET_PSHM_NODES=$i ./a.out; done
NP=1    Converged at 5590, err 4.99825
NP=2    Converged at 5590, err 4.99825
NP=4    Converged at 5590, err 4.99825
NP=8    Converged at 5590, err 4.99825

$ module swap PrgEnv PrgEnv/llvm
$ upcxx -network=smp -O ~/upcxx/example/prog-guide/rb1d.cpp
$ for i in 1 2 4 8; do echo -n "NP=$i    "; env GASNET_PSHM_NODES=$i ./a.out; done
NP=1    Converged at 5590, err 4.99825
NP=2    Converged at 5590, err 4.99825
NP=4    Converged at 5590, err 4.99825
NP=8    Converged at 5590, err 4.99825

$ module swap PrgEnv PrgEnv/aocc
$ upcxx -network=smp -O ~/upcxx/example/prog-guide/rb1d.cpp
$ for i in 1 2 4 8; do echo -n "NP=$i    "; env GASNET_PSHM_NODES=$i ./a.out; done
NP=1    Converged at 5590, err 4.99825
NP=2    Converged at 5590, err 4.99825
NP=4    Converged at 5590, err 4.99825
NP=8    Converged at 5590, err 4.99825

$ module swap PrgEnv PrgEnv/intel
$ upcxx -network=smp -O ~/upcxx/example/prog-guide/rb1d.cpp
icpc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
$ for i in 1 2 4 8; do echo -n "NP=$i    "; env GASNET_PSHM_NODES=$i ./a.out; done
NP=1    Converged at 5590, err 4.99825
NP=2    Converged at 5590, err 4.99825
NP=4    Converged at 5590, err 4.99825
NP=8    Converged at 5590, err 4.99825

2023-04-22T05:55:58+00:00

Paul Hargrove reporter

Related FYI: I've filed NERSC ticket INC0204441 to request installation of Intel's 2023.1.0 compilers on Perlmutter which would enable us to begin CI testing of PrgEnv-intel w/o the need to address this issue.

2023-05-25T03:37:16+00:00

Paul Hargrove reporter

Today I verified that the current 2023.1.0 version of the oneAPI compilers work correctly on Perlmutter. Therefore, I no longer have plans to "fix" this issue. However, this issue remains open pending internal discussions regarding the possibility of rejecting the impacted compiler versions at configure time.

For the benefit of those who read issue trackers from the bottom-up:

Upgrade to the 2023.1.0 (or later) versions of icx and icpx is the recommended work-around.
It is unknown if the incorrect behaviors reported here are a result of a compiler problem or UB in either the tests or UPC++.

2023-06-13T00:43:44+00:00

Dan Bonachea

removed milestone

Clear past Milestone for open issues

2024-01-03T22:22:46+00:00

Comments (4)