thornyflat - production run segmentation fault

Create issue
Issue #2598 new
Maria created an issue

Simfactory scripts for ThornyFlat give segmentation fault in production run. I am attaching the cfg, ini, run and sub scripts.

In the simulations directory, the SIMFACTORY directory does not have inside the subdirectories: exe, cfg, run and par and NODES is empty. The only file is the properties.ini, which includes:

____

Loading torque version 6.1.3 : dev/torque/6.1.3

Loading openmpi version 4.1.2_gcc112 : parallel/openmpi/4.1.2_gcc112

Loading openblas version 0.3.19_gcc112 : libs/openblas/0.3.19_gcc112

+ set -e

+ cd /scratch/mbh0012/simulations/bnsG2/output-0000-active

+ echo Checking:

+ pwd

+ hostname

+ date

+ cat

+ echo Environment:

+ export GMON_OUT_PREFIX=gmon.out

+ GMON_OUT_PREFIX=gmon.out

+ export CACTUS_NUM_PROCS=2

+ CACTUS_NUM_PROCS=2

+ export CACTUS_NUM_THREADS=20

+ CACTUS_NUM_THREADS=20

+ export OMP_NUM_THREADS=20

+ OMP_NUM_THREADS=20

+ env

+ sort

+ echo Starting:

++ date +%s

+ export CACTUS_STARTTIME=1642991790

+ CACTUS_STARTTIME=1642991790

+ mpiexec -n 2 -npernode 2 /scratch/mbh0012/simulations/bnsG2/SIMFACTORY/exe/cactus_nst -L 3 /scratch/mbh0012/simulations/bnsG2/output-0000/nsnstohmns.par

[trcis001:09698] *** Process received signal ***

[trcis001:09698] Signal: Segmentation fault (11)

[trcis001:09698] Signal code:  (128)

[trcis001:09698] Failing at address: (nil)

[trcis001:09698] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2b4fc7e60630]

[trcis001:09698] [ 1] /shared/software/parallel/openmpi/4.1.2_gcc112/lib/libopen-rte.so.40(orte_get_attribute+0x21)[0x2b4fc6f4b101]

[trcis001:09698] [ 2] /shared/software/parallel/openmpi/4.1.2_gcc112/lib/libopen-rte.so.40(orte_plm_base_setup_job+0xf0)[0x2b4fc6f83530]

[trcis001:09698] [ 3] /lib64/libevent_core-2.0.so.5(event_base_loop+0x774)[0x2b4fc7a2f3a4]

[trcis001:09698] [ 4] mpiexec[0x40133a]

[trcis001:09698] [ 5] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b4fc808f555]

[trcis001:09698] [ 6] mpiexec[0x40114e]

[trcis001:09698] *** End of error message ***

/scratch/mbh0012/simulations/bnsG2/output-0000/SIMFACTORY/RunScript: line 24:  9698 Segmentation fault      (core dumped) mpiexec -n 2 -npernode 2 /scratch/mbh0012/simulations/bnsG2/SIMFACTORY/exe/cactus_nst -L 3 /scratch/mbh0012/simulations/bnsG2/output-0000/nsnstohmns.par

Comments (3)

  1. Maria reporter

    With the command:

    ./simfactory/bin/sim create-submit tov --configuration etk --machine=thornyflat --parfile=par/static_tov.par --cores=10

    I am getting the warnings:

    Warning: Current Working directory does not match Cactus sourcetree, changing to /users/mbh0012/Cactus

    Warning: Too many threads per process specified: specified num-threads=20 (ppn-used is 40)
    Warning: Total number of threads and number of threads per process are inconsistent: procs=10, num-threads=20 (procs*num-smt must be an integer multiple of num-threads)
    Warning: Total number of threads and number of cores per node are inconsistent: procs=10, ppn-used=40 (procs must be an integer multiple of ppn-used)

  2. Maria reporter

    Again, the simulation gave me Segmentation Fault. In essence, the problem seems to be here:

    /scratch/mbh0012/simulations/tov/output-0000/SIMFACTORY/RunScript: line 24: 9372 Segmentation fault (core dumped) mpiexec -n 1 -npernode 2 /scratch/mbh0012/simulations/tov/SIMFACTORY/exe/cactus_etk -L 3 /scratch/mbh0012/simulations/tov/output-0000/static_tov.par

    This can be traced to thornyflat.run, which is attached above. Most likely, there is an error in this line:

    mpiexec -n @NUM_PROCS@ -npernode @(@PPN_USED@ / @NUM_THREADS@)@ @EXECUTABLE@ -L 3 @PARFILE@

  3. Maria reporter

    The helpdesk thinks that is an internal problem with Einstein Toolkit, bug or a bad software design, which has as result not checking for proper allocations. I suspect the problem is in the submit or run files.

  4. Log in to comment