Test failures with oneAPI compiler versions 2022.2.0, 2022.2.1, and 2023.0.0
First the good news:
The defect described in this issue has not been observed to occur with icpx
releases older or newer than those listed in the subject line.
The following tests have been observed to run to completion, but print numerically incorrect results in opt
codemode when compiled with certain versions of the oneAPI compilers:
example/prog-guide/rb1d.cpp
example/prog-guide/rb1d-rpc.cpp
example/prog-guide/rb1d-rpcinit.cpp
upcxx-extras::tutorials/2021-11/examples/jac1d.cpp
upcxx-extras::tutorials/2021-11/solutions/ex2.cpp
So far this is occurring for all threadmode and network combinations I've tried, including {seq,par}X{smp,ibv,ofi/cxi}
. It has not occurred in any debug
codemode trials.
Testing of five oneAPI releases (believed to be consecutive) yields:
- 2022.1.0 GOOD
- 2022.2.0 BAD
- 2022.2.1 BAD
- 2023.0.0 BAD
- 2023.1.0 GOOD
Therefore, upgrade to the 2023.1.0 versions of icx
and icpx
is the recommended work-around.
It is unknown if these are a result of a compiler problem or UB in these test. So, the first task related to this issue should be ruling out UB.
Comments (4)
-
reporter -
reporter Related FYI: I've filed NERSC ticket INC0204441 to request installation of Intel's 2023.1.0 compilers on Perlmutter which would enable us to begin CI testing of PrgEnv-intel w/o the need to address this issue.
-
reporter Today I verified that the current 2023.1.0 version of the oneAPI compilers work correctly on Perlmutter. Therefore, I no longer have plans to "fix" this issue. However, this issue remains open pending internal discussions regarding the possibility of rejecting the impacted compiler versions at configure time.
For the benefit of those who read issue trackers from the bottom-up:
Upgrade to the 2023.1.0 (or later) versions of
icx
andicpx
is the recommended work-around.
It is unknown if the incorrect behaviors reported here are a result of a compiler problem or UB in either the tests or UPC++. -
- removed milestone
Clear past Milestone for open issues
- Log in to comment
Update with today's progress.
First, a clarification/correction: While I characterized the results as "numerically incorrect", some might not agree that is the right way to describe what is occurring. The output from
rb1d
is the number of iterations to converge and the max error when converged. The results for the compilers listed show convergence at a different iteration count (and with a different error) than all others compilers tested (oneAPI or otherwise). I've taken this to indicate a numerically different (presumed "incorrect") solution.The first output below demonstrates that with the most recent oneAPI compiler release on Dirac, the result is independent of the process count. However, the second shows one of the problematic compiler versions yields a different result for process counts 1, 2 and 4 (4 processes is where the "numerically incorrect results" were observed).
Demonstration that other compiler families (still on Dirac) do not have this behavior (where "intel" below is the "Classic" compilers, not oneAPI):