Simfactory scripts for ThornyFlat give segmentation fault in production run. I am attaching the cfg, ini, run and sub scripts.
In the simulations directory, the SIMFACTORY directory does not have inside the subdirectories: exe, cfg, run and par and NODES is empty. The only file is the properties.ini, which includes:
____
Loading torque version 6.1.3 : dev/torque/6.1.3
Loading openmpi version 4.1.2_gcc112 : parallel/openmpi/4.1.2_gcc112
Loading openblas version 0.3.19_gcc112 : libs/openblas/0.3.19_gcc112
+ set -e
+ cd /scratch/mbh0012/simulations/bnsG2/output-0000-active
+ echo Checking:
+ pwd
+ hostname
+ date
+ cat
+ echo Environment:
+ export GMON_OUT_PREFIX=gmon.out
+ GMON_OUT_PREFIX=gmon.out
+ export CACTUS_NUM_PROCS=2
+ CACTUS_NUM_PROCS=2
+ export CACTUS_NUM_THREADS=20
+ CACTUS_NUM_THREADS=20
+ export OMP_NUM_THREADS=20
+ OMP_NUM_THREADS=20
+ env
+ sort
+ echo Starting:
++ date +%s
+ export CACTUS_STARTTIME=1642991790
+ CACTUS_STARTTIME=1642991790
+ mpiexec -n 2 -npernode 2 /scratch/mbh0012/simulations/bnsG2/SIMFACTORY/exe/cactus_nst -L 3 /scratch/mbh0012/simulations/bnsG2/output-0000/nsnstohmns.par
[trcis001:09698] *** Process received signal ***
[trcis001:09698] Signal: Segmentation fault (11)
[trcis001:09698] Signal code: (128)
[trcis001:09698] Failing at address: (nil)
[trcis001:09698] [ 0] /lib64/libpthread.so.0(+0xf630)[0x2b4fc7e60630]
[trcis001:09698] [ 1] /shared/software/parallel/openmpi/4.1.2_gcc112/lib/libopen-rte.so.40(orte_get_attribute+0x21)[0x2b4fc6f4b101]
[trcis001:09698] [ 2] /shared/software/parallel/openmpi/4.1.2_gcc112/lib/libopen-rte.so.40(orte_plm_base_setup_job+0xf0)[0x2b4fc6f83530]
[trcis001:09698] [ 3] /lib64/libevent_core-2.0.so.5(event_base_loop+0x774)[0x2b4fc7a2f3a4]
[trcis001:09698] [ 4] mpiexec[0x40133a]
[trcis001:09698] [ 5] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b4fc808f555]
[trcis001:09698] [ 6] mpiexec[0x40114e]
[trcis001:09698] *** End of error message ***
/scratch/mbh0012/simulations/bnsG2/output-0000/SIMFACTORY/RunScript: line 24: 9698 Segmentation fault (core dumped) mpiexec -n 2 -npernode 2 /scratch/mbh0012/simulations/bnsG2/SIMFACTORY/exe/cactus_nst -L 3 /scratch/mbh0012/simulations/bnsG2/output-0000/nsnstohmns.par
Comments (3)
-
reporter -
reporter Again, the simulation gave me Segmentation Fault. In essence, the problem seems to be here:
/scratch/mbh0012/simulations/tov/output-0000/SIMFACTORY/RunScript: line 24: 9372 Segmentation fault (core dumped) mpiexec -n 1 -npernode 2 /scratch/mbh0012/simulations/tov/SIMFACTORY/exe/cactus_etk -L 3 /scratch/mbh0012/simulations/tov/output-0000/static_tov.par
This can be traced to thornyflat.run, which is attached above. Most likely, there is an error in this line:
mpiexec -n @NUM_PROCS@ -npernode @(@PPN_USED@ / @NUM_THREADS@)@ @EXECUTABLE@ -L 3 @PARFILE@
-
reporter The helpdesk thinks that is an internal problem with Einstein Toolkit, bug or a bad software design, which has as result not checking for proper allocations. I suspect the problem is in the submit or run files.
- Log in to comment
With the command:
./simfactory/bin/sim create-submit tov --configuration etk --machine=thornyflat --parfile=par/static_tov.par --cores=10
I am getting the warnings:
Warning: Current Working directory does not match Cactus sourcetree, changing to /users/mbh0012/Cactus
…
Warning: Too many threads per process specified: specified num-threads=20 (ppn-used is 40)
Warning: Total number of threads and number of threads per process are inconsistent: procs=10, num-threads=20 (procs*num-smt must be an integer multiple of num-threads)
Warning: Total number of threads and number of cores per node are inconsistent: procs=10, ppn-used=40 (procs must be an integer multiple of ppn-used)