current master with failing tests

Issue #19 resolved
Nico Schlömer created an issue

On the current MOAB master, tests are failing:

96% tests passed, 2 tests failed out of 46

Total Test time (real) =  60.39 sec

The following tests FAILED:
     29 - MOAB_iMeshP_unit_tests (OTHER_FAULT)
     34 - TestParallel (Failed)

Apparently, MOAB doesn't have proper continuous integration. Now that we have fixed builds for Debian, we could easily set those up on travis-ci, for example.

Comments (11)

  1. Vijay M

    Strongly disagree on your statement about proper CI. This has never failed in all our tests, local (OSX, Linux variants, ALCF) and on buildbot. The only reason the parallel tests can fail is with some wrong MPI installation locally or improper configuration.

  2. Nico Schlömer reporter

    Didn't mean to offend; obviously you have far better than average nightly coverage.

    What I observe are failing tests on two different machines, one Ubuntu 14.04 (trusty), one Ubuntu 15.04 (vivid). Everything (including MPI) is default.

    The failures can be reproduced with

    CC=mpicc CXX=mpicxx FC=mpif90 F77=mpif77 \
    cmake \
      -DCMAKE_SHARED_LINKER_FLAGS="$CMAKE_SHARED_LINKER_FLAGS -Wl,--no-undefined" \
      -DCMAKE_INSTALL_PREFIX:PATH=/opt/moab/ \
      -DENABLE_CGM:BOOL=OFF \
      -DENABLE_HDF5:BOOL=OFF \
      -DENABLE_IMESH:BOOL=ON \
      -DENABLE_IGEOM:BOOL=ON \
      -DENABLE_MPI:BOOL=ON \
      -DENABLE_NETCDF:BOOL=OFF \
      -DENABLE_METIS:BOOL=OFF \
      -DENABLE_PARMETIS:BOOL=OFF \
      -DENABLE_ZOLTAN:BOOL=OFF \
      ../../source-upstream/
    

    on both boxes.

    More details:

    ctest -I 29,29 -V
    UpdateCTestConfiguration  from :/home/nschloe/software/moab/build/launchpad/DartConfiguration.tcl
    UpdateCTestConfiguration  from :/home/nschloe/software/moab/build/launchpad/DartConfiguration.tcl
    Test project /home/nschloe/software/moab/build/launchpad
    Constructing a list of tests
    Done constructing a list of tests
    Checking test dependency graph...
    Checking test dependency graph end
    test 29
        Start 29: MOAB_iMeshP_unit_tests
    
    29: Test command: /home/nschloe/software/moab/build/launchpad/bin/MOAB_iMeshP_unit_tests
    29: Test timeout computed to be: 9.99988e+06
    29: [0]MOAB ERROR: --------------------- Error Message ------------------------------------
    29: [0]MOAB ERROR: Expected Keyword!
    29: [0]MOAB ERROR: load_file() line 178 in ReadABAQUS.cpp
    29: [0]MOAB ERROR: --------------------- Error Message ------------------------------------
    29: [0]MOAB ERROR: Failed getting tag handle in delete_nonlocal_entities!
    29: [0]MOAB ERROR: delete_nonlocal_entities() line 605 in ReadParallel.cpp
    29: [0]MOAB ERROR: --------------------- Error Message ------------------------------------
    29: [0]MOAB ERROR: Failed in step PARALLEL DELETE NONLOCAL!
    29: [0]MOAB ERROR: load_file() line 553 in ReadParallel.cpp
    29: [0]MOAB ERROR: load_file() line 252 in ReadParallel.cpp
    29: [0]MOAB ERROR: load_file() line 515 in Core.cpp
    29: Using default writer WriteVtk for file iMeshP_test_file 
    29: Error code  1 at /home/nschloe/software/moab/source-upstream/itaps/imesh/MOAB_iMeshP_unit_tests.cpp:670
    29: Failed to load input mesh.
    29: Cannot run further tests.
    29: ABORTING
    29: [fuji:23267] *** Process received signal ***
    29: [fuji:23267] Signal: Aborted (6)
    29: [fuji:23267] Signal code:  (-6)
    29: [fuji:23267] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x352f0) [0x7f5f7d0d22f0]
    29: [fuji:23267] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37) [0x7f5f7d0d2267]
    29: [fuji:23267] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a) [0x7f5f7d0d3eca]
    29: [fuji:23267] [ 3] /home/nschloe/software/moab/build/launchpad/bin/MOAB_iMeshP_unit_tests(_Z8run_testPFiP22iMesh_Instance_PrivateP30iMeshP_PartitionHandle_PrivateRK7PartMapEPKc+0x335) [0x431dc0]
    29: [fuji:23267] [ 4] /home/nschloe/software/moab/build/launchpad/bin/MOAB_iMeshP_unit_tests(main+0x172) [0x4320ea]
    29: [fuji:23267] [ 5] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7f5f7d0bda40]
    29: [fuji:23267] [ 6] /home/nschloe/software/moab/build/launchpad/bin/MOAB_iMeshP_unit_tests(_start+0x29) [0x430ff9]
    29: [fuji:23267] *** End of error message ***
    
    ctest -I 34,34 -V
    UpdateCTestConfiguration  from :/home/nschloe/software/moab/build/launchpad/DartConfiguration.tcl
    UpdateCTestConfiguration  from :/home/nschloe/software/moab/build/launchpad/DartConfiguration.tcl
    Test project /home/nschloe/software/moab/build/launchpad
    Constructing a list of tests
    Done constructing a list of tests
    Checking test dependency graph...
    Checking test dependency graph end
    test 34
        Start 34: TestParallel
    
    34: Test command: /usr/bin/mpiexec "-np" "2" "/home/nschloe/software/moab/build/launchpad/bin/parallel_unit_tests"
    34: Test timeout computed to be: 9.99988e+06
    34: [0]MOAB ERROR: --------------------- Error Message ------------------------------------
    34: [0]MOAB ERROR: Expected Keyword!
    34: [1]MOAB ERROR: --------------------- Error Message ------------------------------------
    34: [1]MOAB ERROR: Expected Keyword!
    34: [1]MOAB ERROR: load_file() line 178 in ReadABAQUS.cpp
    34: [0]MOAB ERROR: load_file() line 178 in ReadABAQUS.cpp
    34: [0]MOAB ERROR: --------------------- Error Message ------------------------------------
    34: [0]MOAB ERROR: This doesn't appear to be a .cub file!
    34: [0]MOAB ERROR: load_file() line 308 in Tqdcfr.cpp
    [...]
    34: Error code  [0]MOAB ERROR: load_file() line 515 in Core.cpp
    34: Error code  16 at 16 at /home/nschloe/software/moab/source-upstream/test/parallel/parallel_unit_tests.cpp:1698
    34: /home/nschloe/software/moab/source-upstream/test/parallel/parallel_unit_tests.cpp:1698
    34: --------------------------------------------------------------------------
    34: mpiexec noticed that the job aborted, but has no info as to the process
    34: that caused that situation.
    34: --------------------------------------------------------------------------
    1/1 Test #34: TestParallel .....................***Failed    1.25 sec
    

    Both apparently I/O errors. Missing files? @vijaysm, can you reproduce the error on your machine with the above CMake script?

  3. Vijay M

    This test assumes configuration with HDF5 since it is trying to read H5M files from disk. Parallel I/O is supported only with our HDF interfaces and your CMake isn't configured with this dependency.

    Perhaps we should check for this in the test or not do certain I/O tests when HDF5 is disabled.

  4. Nico Schlömer reporter

    Perhaps we should check for this in the test or not do certain I/O tests when HDF5 is disabled.

    Sounds like the appropriate fix to me. Do you want me to go ahead and create a PR?

    (Btw, we cannot at this moment enable all of MPI, HDF5, and netCDF. The reason for this is that netCDF is needed in its serial version, hence depending on serial HDF5. @vijaysm Which two of MPI, HDF5, netCDF would you prefer for Debian?)

  5. Vijay M

    Sounds like the appropriate fix to me. Do you want me to go ahead and create a PR?

    I am working on a cleaner fix. Will submit a PR for this in couple of hours.

    Which two of MPI, HDF5, netCDF would you prefer for Debian?)

    I think having MPI + HDF5 would be a great addition to Debian currently. We can expose most of our features with this combination and NetCDF can be added in later.

  6. Nico Schlömer reporter

    I am working on a cleaner fix. Will submit a PR for this in couple of hours.

    Great! If this needs reviewing, I'll be happy to help. Once this is in master, we can merge master back into PR #157, I can retest and hopefully get that one going as well.

    After that, I'll take care of the Debian PR.

  7. Iulian Grindeanu

    MPI + HDF5 is the priority, I would say
    for climate applications another configuration is important for us: MPI + pnetcdf + netcdf (so parallel IO is carried on using pnetcdf; pnetcdf needs netcdf too)

  8. Vijay M

    Great! If this needs reviewing, I'll be happy to help. Once this is in master, we can merge master back into PR #157, I can retest and hopefully get that one going as well.

    Thanks. There is a working branch vijaysm/mpi-no-hdf5-testfixes that contains some fixes. Iulian is working on the rest of the tests.

    I also added a buildbot tests to run a bare configuration of MOAB with only MPI. The autoconf version succeeded as it should.

    http://gnep.mcs.anl.gov:8010/builders/moab-bare-par-develop/builds/0

    So my bigger question here is to understand why you are configuring MOAB with CMake instead of Autotools ? The support for the former is very recent and I have been maintaining this on and off but autotools support is very mature for our stack. Is there a particular advantage to one over another for Debian configurations ?

  9. Nico Schlömer reporter

    So my bigger question here is to understand why you are configuring MOAB with CMake instead of Autotools ?

    I know CMake better than autotools and I didn't see a warning message about it being immature in MOAB. Moreover, I appreciate the fact that you can do out-of-source builds. For Debian I don't know if there's a difference in support, but I don't know the Debian autotools build tools equally well.

  10. Vijay M

    I know CMake better than autotools and I didn't see a warning message about it being immature in MOAB.

    Fair enough. CMake was added as an alternate to autotools due to requests for usage in Windows. This was the primary motivator and so initial versions didn't even support MPI. All of these are additions I had to put in so that its of some practical use. Certainly, this is nowhere near complete or as fully functional as the autotools configuration system.

    Moreover, I appreciate the fact that you can do out-of-source builds.

    We recommend doing out of source builds with autoconf. I personally never do in-source-builds.

    For Debian I don't know if there's a difference in support, but I don't know the Debian autotools build tools equally well.

    Ok. This might be something to consider if we want a much fuller support wrt dependencies etc.

  11. Log in to comment