Tests randomly failing on sparc
Hi Martin,
When building the m4ri package for Debian, I noticed that some tests are failing, only on sparc architectures. These test are randomly failing: running twice ./test_multiplication, I got:
the first time: addsqr: m: 4096, k: 0, cutoff: 2048 add,mul = addmul M4RM != addmul ... FAILED addsqr: m: 1000, k: 0, cutoff: 64 M4RM != add,mul add,mul = addmul ... FAILED
the second time: addmul: m: 1710, l: 1290, n: 1000, k: 0, cutoff: 256 add,mul = addmul M4RM != addmul ... FAILED sqr: m: 2048, k: 0, cutoff: 1024 Strassen != M4RM Strassen != Naiv ... FAILED addsqr: m: 4096, k: 0, cutoff: 2048 add,mul = addmul M4RM != addmul ... FAILED addsqr: m: 1000, k: 0, cutoff: 64 M4RM != add,mul add,mul = addmul ... FAILED
(the other tests succeeded.)
The other one randomly failing is test_invert, sometimes creating segfaults.
See here for a full log: https://buildd.debian.org/status/fetch.php?pkg=libm4ri&arch=sparc&ver=20130416-3&stamp=1371503160
I do not know exactly where it can come from. Let me tell you if I can make further tests to try to solve this problem.
Thanks in advance for your help.
Cédric
Comments (10)
-
-
repo owner These look indeed strange, first "addmul" seems to be wrong, then "add,mul". Is it always in the same places if it fails, i.e. for the same parameters?
If it's not too much trouble getting a backtrace for test_colswap would be nice.
-
Hi again,
The failure seems related to the fact that the package is built with openmp. Tests started failing when this configure option was set, and removing it for the SPARC architecture yields succesful builds.
-
repo owner Ah, so it's a race condition. Thanks! I suggest to disable openmp (for now), it's not very well maintained and the performance advantage is often little.
-
The failures do not occur always at the same places. They can happen at different times. The examples I gave were the failing instances of the same test (test_multiply) run twice in a row (sorry, I messed up with the layout).
test_colswap seems to pass everytime. Do you need instead a backtrace of test_invert which produced the segfault in the log linked above?
-
Martin, do you recommend to disable it only for sparc or should it be disable for all arches?
-
repo owner All architectures, we typically build & test without it (and we should add a warning).
-
repo owner I just checked in a commit which disables OpenMP in ple_russian, which fixes the race condition for me. Can you try if this works on SPARC?
-
repo owner I'm closing this as resolved for now. Please re-open if necessary.
-
repo owner - changed status to resolved
No movement and we don't have access to SPARC any more.
- Log in to comment
[just for the record, I am the one who sent this bug report]