Tests randomly failing on sparc

Issue #55 resolved
Former user created an issue

Hi Martin,

When building the m4ri package for Debian, I noticed that some tests are failing, only on sparc architectures. These test are randomly failing: running twice ./test_multiplication, I got:

the first time: addsqr: m: 4096, k: 0, cutoff: 2048 add,mul = addmul M4RM != addmul ... FAILED addsqr: m: 1000, k: 0, cutoff: 64 M4RM != add,mul add,mul = addmul ... FAILED

the second time: addmul: m: 1710, l: 1290, n: 1000, k: 0, cutoff: 256 add,mul = addmul M4RM != addmul ... FAILED sqr: m: 2048, k: 0, cutoff: 1024 Strassen != M4RM Strassen != Naiv ... FAILED addsqr: m: 4096, k: 0, cutoff: 2048 add,mul = addmul M4RM != addmul ... FAILED addsqr: m: 1000, k: 0, cutoff: 64 M4RM != add,mul add,mul = addmul ... FAILED

(the other tests succeeded.)

The other one randomly failing is test_invert, sometimes creating segfaults.

See here for a full log: https://buildd.debian.org/status/fetch.php?pkg=libm4ri&arch=sparc&ver=20130416-3&stamp=1371503160

I do not know exactly where it can come from. Let me tell you if I can make further tests to try to solve this problem.

Thanks in advance for your help.

Cédric

Comments (10)

  1. Martin Albrecht repo owner

    These look indeed strange, first "addmul" seems to be wrong, then "add,mul". Is it always in the same places if it fails, i.e. for the same parameters?

    If it's not too much trouble getting a backtrace for test_colswap would be nice.

  2. Cédric Boutillier

    Hi again,

    The failure seems related to the fact that the package is built with openmp. Tests started failing when this configure option was set, and removing it for the SPARC architecture yields succesful builds.

  3. Martin Albrecht repo owner

    Ah, so it's a race condition. Thanks! I suggest to disable openmp (for now), it's not very well maintained and the performance advantage is often little.

  4. Cédric Boutillier

    The failures do not occur always at the same places. They can happen at different times. The examples I gave were the failing instances of the same test (test_multiply) run twice in a row (sorry, I messed up with the layout).

    test_colswap seems to pass everytime. Do you need instead a backtrace of test_invert which produced the segfault in the log linked above?

  5. Cédric Boutillier

    Martin, do you recommend to disable it only for sparc or should it be disable for all arches?

  6. Martin Albrecht repo owner

    All architectures, we typically build & test without it (and we should add a warning).

  7. Martin Albrecht repo owner

    I just checked in a commit which disables OpenMP in ple_russian, which fixes the race condition for me. Can you try if this works on SPARC?

  8. Log in to comment