Several tests fail in Jenkins after move to new NCSA build node

Create issue
Issue #2019 closed
Ian Hinder created an issue

After the move of the Jenkins build system from UCD to NCSA, several tests started failing. See https://build-test.barrywardell.net/job/EinsteinToolkit/936/testReport/ for the details. The thorns with failures are CT_MultiLevel, SphericalHarmonicReconGen and GRHydro.

Keyword:

Comments (9)

  1. Ian Hinder reporter
    • removed comment

    The test machine "login.barrywardell.net" is configured in the same way as the Jenkins build node, and runs on the same hardware (at least it reports the same information in /proc/cpuinfo). I am focusing first on the GRHydro_test_shock_weno test. This fails with the standard ubuntu.cfg from simfactory. However, it passes with [https://bitbucket.org/ianhinder/cactusjenkins/src/d7021a52bd83448db589b2346c43441682eecabb/build.cfg?at=master build.cfg], which is an optionlist which was used before for the test system (I'm not 100% sure when it was changed to ubuntu.cfg; this may coincide with the move to the NCSA machine). There are several differences, and I am working my way through them.

    The possible differences responsible are:

    diff build.cfg ubuntu.cfg with irrelevant bits removed:

    33,36c30,33
    < CFLAGS   = -g3 -std=gnu99
    < CXXFLAGS = -g3 -std=gnu++0x
    < F77FLAGS = -g3 -fcray-pointer -ffixed-line-length-none
    < F90FLAGS = -g3 -fcray-pointer -ffixed-line-length-none -ffree-line-length-none
    ---
    > CFLAGS   = -g3 -march=native -std=gnu99
    > CXXFLAGS = -g3 -march=native -std=gnu++0x
    > F77FLAGS = -g3 -march=native -fcray-pointer -ffixed-line-length-none -fno-range-check
    > F90FLAGS = -g3 -march=native -fcray-pointer -ffixed-line-length-none -fno-range-check
    39,43d35
    < VECTORISE                  = yes
    < VECTORISE_ALIGNED_ARRAYS   = no
    < VECTORISE_INLINE           = no
    < VECTORISE_STREAMING_STORES = no
    < 
    49d40
    < # -check-uninit fails for asm output operands
    61,64c52,55
    < C_OPTIMISE_FLAGS   = -O2 #-ffast-math
    < CXX_OPTIMISE_FLAGS = -O2 #-ffast-math
    < F77_OPTIMISE_FLAGS = -O2 #-ffast-math
    < F90_OPTIMISE_FLAGS = -O2 #-ffast-math
    ---
    > C_OPTIMISE_FLAGS   = -O2 -ffast-math -fno-finite-math-only
    > CXX_OPTIMISE_FLAGS = -O2 -ffast-math -fno-finite-math-only
    > F77_OPTIMISE_FLAGS = -O2 -ffast-math -fno-finite-math-only
    > F90_OPTIMISE_FLAGS = -O2 -ffast-math -fno-finite-math-only
    91,93c82,87
    < 
    < HDF5_ENABLE_CXX     = no
    < HDF5_ENABLE_FORTRAN = no
    ---
    96,100d89
    < MPI_INC_DIRS = /usr/include/mpich2 /usr/include/mpich
    < MPI_LIB_DIRS = /usr/lib
    < MPI_LIBS     = mpich  mpl
    103,119d91
    < 
    < BLAS_DIR    = NO_BUILD
    < LAPACK_DIR  = NO_BUILD
    < LIBJPEG_DIR = NO_BUILD
    < 
    < ZLIB_DIR    = /usr
    < ZLIB_LIB_DIRS = /usr/lib/x86_64-linux-gnu
    < ZLIB_INC_DIRS = /usr/include
    

    Removing -march=native from ubuntu.cfg does not make the test pass.

  2. Frank Löffler
    • removed comment

    Action items from call on Mar 20th: * Roland: Try if increasing the resolution makes the hydro tests work (might use too low resolution anyway) * Erik: See how much fast math changes performance / reproducibility on a "modern Haswell" machine

  3. Ian Hinder reporter
    • removed comment

    From the wiki page, a conclusion might be that in newer versions of gcc, fast-math may be much more aggressive, and cause the GRHydro.GRHydro_test_shock_weno/1procs test to fail. Older versions with ubuntu.cfg (and hence fast-math) seem to work, whereas newer versions don't. Omitting fast-math on the newer version also works. We talked on the call about possibly making the optionlists more conservative to avoid all the time spent on these sorts of problems, but we would like to see benchmark results before making such a decision.

  4. Frank Löffler
    • changed status to open
    • removed comment

    This changes weno_eps for the testsuite. That's ok for the release, so (review_ok), but should we maybe change the default as well (after this release), if this is indeed so prone to problems?

  5. Roland Haas
    • changed status to resolved
    • removed comment

    Applied as git hash 01b6e12 of einsteinevolve.

    I do not think we can easily change the default since the defaults are for backwards compatibility and not necessarily "good starting point" values. It is also not clear that 1e-26 is actually a bad value for typical problems that are not a shocktube test.

  6. Log in to comment