Code crash after checkpointing when group finding is enabled
see title
Comments (6)
-
repo owner -
Account Deleted reporter The problematic assert is:
pkdgrav3_mpi: mdl2/mpi/mdl.c:655: mdlCacheReceive: Assertion `c->iType != 0' failed.
I had both lightcone healpix and group finding enabled.
-
Mischa, can you upload the text file containing the slurm output (the text output from pkdgrav3) just prior to the assert so that we can see clearly what phase of the code it reached. It would be good to show a couple of 100 lines before this point maybe.
-
Account Deleted reporter - attached slurm-2121.out
This is the relevant slurm output file. The line I quoted is line nr 53023.
-
repo owner So the phases were:
- Last substep ends
- Domain Decomposition & Tree build
- FoF group finding
- Gravity on main step
- Output of group statistics
- P(k) measurement
- Checkpoint written
- Gravity on main step (repeated) -> crashes
Root cause is that during the second gravity the Cell cache (CID 1) is not open. When no checkpoint is written, the last two steps are omitted and it doesn't crash.
-
repo owner - changed status to resolved
Group Statistics closes the cell cache. Normally the tree is rebuilt, except in the case list. Fixed with: https://bitbucket.org/dpotter/pkdgrav3/commits/1492480ea6a8de6848878fcecf4ccd5cddaf869c
- Log in to comment
What are the symptoms (e.g., what assert is tripped)? Did you mean that that code crashes when lightcone healpix output is enabled, but group finding is not?