newest revision of simfactory 2.0 submit three instead of one job

Issue #420 resolved
anonymous created an issue

I am submitting a simulation using simfactory with the command:

[snip] simfactory/bin/sim submit test-whisky-openmp --configuration test-whisky-openmp --parfile=whisky-openmp-test-ali.par --verbose --walltime=48:00:00 --procs=16 --ppn=4 --num-threads=4 --machine=damiana --queue=intel.q [snip]

sim then comes with the messages:

[snip] Info: Simfactory command: simfactory/bin/../lib/sim.py "submit" "test-whisky-openmp" "--configuration" "test-whisky-openmp" "--parfile=whisky-openmp-test-ali.par" "--verbose" "--walltime=48:00:00" "--procs=16" "--ppn=4" "--num-threads=4" "--machine=damiana" "--queue=intel.q" Info: Version 1331M The Simulation Factory: Manage Cactus simulations

Info: defs: /home/alibeck/programme/Cactus-Luca/Cactus/simfactory/etc/defs.ini Info: defs.local: /home/alibeck/programme/Cactus-Luca/Cactus/simfactory/etc/defs.local.ini Info: Cactus Directory: /home/alibeck/programme/Cactus-Luca/Cactus Info: simenv.COMMAND: submit Info: Executing command: submit Info: Assigned restart_id of: 0002 Info: Found the following restart_ids: [0, 1] Info: Maximum restart id determined to be: 0001 Assigned restart id: 2 Info: Simulation is inactive: submitting Info: Job allocation information: Info: System: nodes=170 cores/node=4 threads/process=4 Info: Requested: nodes=4 cores=16 cores/node=4 Info: Run: processes=4 threads=16 threads/process=4 Info: Distribution: processes/node=1 threads/node=4 Info: Ratio: threads/core=1.000 cores/thread=1.000 Info: writing to internalDir: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0002/SIMFACTORY Info: saving substituted submitscript contents to: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0002/SIMFACTORY/SubmitScript Executing submit command: qsub /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0002/SIMFACTORY/SubmitScript Submit finished, job id is 259460 Info: Restart 2 is active Info: Assigned restart_id of: 0003 Info: Found the following restart_ids: [0, 1, 2, 2] Info: Maximum restart id determined to be: 0002 Assigned restart id: 3 Info: Simulation is active: presubmitting Info: Job allocation information: Info: System: nodes=170 cores/node=4 threads/process=4 Info: Requested: nodes=4 cores=16 cores/node=4 Info: Run: processes=4 threads=16 threads/process=4 Info: Distribution: processes/node=1 threads/node=4 Info: Ratio: threads/core=1.000 cores/thread=1.000 Info: writing to internalDir: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0003/SIMFACTORY Info: saving substituted submitscript contents to: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0003/SIMFACTORY/SubmitScript Executing submit command: qsub /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0003/SIMFACTORY/SubmitScript Submit finished, job id is 259461 Info: Restart 2 is active Info: Assigned restart_id of: 0004 Info: Found the following restart_ids: [0, 1, 2, 2, 3] Info: Maximum restart id determined to be: 0003 Assigned restart id: 4 Info: Simulation is active: presubmitting Info: Job allocation information: Info: System: nodes=170 cores/node=4 threads/process=4 Info: Requested: nodes=4 cores=16 cores/node=4 Info: Run: processes=4 threads=16 threads/process=4 Info: Distribution: processes/node=1 threads/node=4 Info: Ratio: threads/core=1.000 cores/thread=1.000 Info: writing to internalDir: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0004/SIMFACTORY Info: saving substituted submitscript contents to: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0004/SIMFACTORY/SubmitScript Executing submit command: qsub /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0004/SIMFACTORY/SubmitScript Submit finished, job id is 259462 [snip]

As a result three jobs are queued.

[snip] qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 259460 0.00000 test-whisk alibeck qw 04/29/2011 10:26:55 16 259461 0.00000 test-whisk alibeck hqw 04/29/2011 10:26:56 16 259462 0.00000 test-whisk alibeck hqw 04/29/2011 10:26:56 16 [snip]

What is going wrong here?

Keyword:

Comments (3)

  1. Erik Schnetter
    • removed comment

    It seems that this is presubmission. The wall time you requested was longer than the wall time limit, so SimFactory broke up your simulation into three pieces that will execute sequentially. You see this from the three lines

    Info: Simulation is inactive: submitting Info: Simulation is active: presubmitting Info: Simulation is active: presubmitting

    Notice that your initial submit command did not create a new simulation; instead, it restarted an existing simulation that had already two restarts.

    I do not know how well-tested presubmission is on Damiana.

  2. Log in to comment