qc0-mclachlan fails with Assertion `not reg.processors' failed

Create issue
Issue #1087 closed
Ian Hinder created an issue

I am running qc0-mclachlan.par from the ET trunk on Datura. I get this error message during startup:

cactus_sim: /home/ianhin/Cactus/EinsteinToolkit/arrangements/Carpet/CarpetLib/src/region.cc:166: void combine_regions(const std::vector<region_t, std::allocator<region_t>> &, std::vector<region_t, std::allocator<region_t>> &): Assertion `not reg.processors' failed.

I get the same message whether I run on 1 MPI process or 2, and also on another machine. Since qc0-mclachlan is a standard BBH simulation, I assume that a similar error could/does occur for any binary simulation with moving-boxes mesh refinement. Setting priority to Critical as a result.

Keyword:

Comments (7)

  1. Ian Hinder reporter
    • removed comment

    Backtrace is

    1. 0 0x00002b796fcb88a5 in raise () from /lib64/libc.so.6
    2. 1 0x00002b796fcba085 in abort () from /lib64/libc.so.6
    3. 2 0x00002b796fcb1a1e in assert_fail_base () from /lib64/libc.so.6
    4. 3 0x00002b796fcb1ae0 in assert_fail () from /lib64/libc.so.6
    5. 4 0x000000000180b6b3 in combine_regions (oldregs=Traceback (most recent call last):
    6. 5 0x00000000016ebc73 in Carpet::SplitRegionsMaps_Automatic (cctkGH=0x1692, superregss=
    7. 6 0x00000000016f1140 in Carpet::SplitRegionsMaps (cctkGH=0x1692, superregss=Traceback (most recent call last):
    8. 7 0x0000000000b934a5 in CarpetRegrid2::CarpetRegrid2_RegridMaps (cctkGH_=0x1692, superregsss_=0x1692, regssss_=0x6, force=-1) at /home/ianhin/Cactus/EinsteinToolkit/arrangements/Carpet/CarpetRegrid2/src/regrid.cc:843
    9. 8 0x00000000006bcb60 in Carpet_RegridMaps (cctkGH=0x1692, superregsss=0x1692, regssss=0x6, force=-1) at /home/ianhin/Cactus/EinsteinToolkit/configs/sim/bindings/Functions/AliasedFunctions.c:1798
    10. 9 0x00000000016e2a54 in Carpet::Regrid (cctkGH=0x1692, force_recompose=146, do_init=6) at /home/ianhin/Cactus/EinsteinToolkit/arrangements/Carpet/Carpet/src/Recompose.cc:215
    11. 10 0x00000000016db8d7 in Carpet::CallRegridInitialMeta (cctkGH=0x1692) at /home/ianhin/Cactus/EinsteinToolkit/arrangements/Carpet/Carpet/src/Initialise.cc:1050
    12. 11 0x00000000016dce2f in Carpet::CallInitial (cctkGH=0x1692) at /home/ianhin/Cactus/EinsteinToolkit/arrangements/Carpet/Carpet/src/Initialise.cc:375
    13. 12 0x00000000016dd3ae in Carpet::Initialise (fc=0x1692) at /home/ianhin/Cactus/EinsteinToolkit/arrangements/Carpet/Carpet/src/Initialise.cc:126
    14. 13 0x000000000053bad9 in main (argc=4, argv=0x7fff2ebb9d08) at /home/ianhin/Cactus/EinsteinToolkit/src/main/flesh.cc:80

    The oldregs and superregss seem to cause problems for the debugger, with many traceback errors saying it can't access certain memory.

  2. Ian Hinder reporter
    • removed comment

    git bisect tells me that the problem first starts appearing introduced in this commit:

    commit 744af16b61a3bbbcb752af1ed11ed02831049179 Author: Erik Schnetter <schnetter@gmail.com> Date: Mon Sep 10 22:02:40 2012 -0400

    CarpetLib: Ensure that split/combined regions don't have a tree structure attached

    http://www.carpetcode.org/hg/carpet/index.cgi/rev/dc343eecda5f

    I don't know if this is a detection of an already-existing problem, or if this commit actually introduces new problem.

  3. Tanja Bode
    • removed comment

    The carpet testsuites, among others, fail with the same errors since that patch. It seems Ian's daily trunk testsuite page has stopped updating since the day that patch was applied, otherwise this might've been noticed.

  4. Ian Hinder reporter
    • removed comment

    It coincided with the login node that they are launched from being replaced with another one, and the cron jobs not being transferred. I have reinstated the cron job this morning, and the system is churning through commits now.

    Erik: would reverting the patch be the right solution? This potentially affects production runs as well as test cases.

  5. Erik Schnetter
    • removed comment

    I introduced the assert because the process tree is not handled correctly when regions are split.

    While we work on the correct solution, I suggest to replace the assert by a comment.

  6. Log in to comment