- removed comment
newest revision of simfactory 2.0 submit three instead of one job
Issue #420
closed
I am submitting a simulation using simfactory with the command:
[snip]
simfactory/bin/sim submit test-whisky-openmp --configuration test-whisky-openmp --parfile=whisky-openmp-test-ali.par --verbose --walltime=48:00:00 --procs=16 --ppn=4 --num-threads=4 --machine=damiana --queue=intel.q
[snip]
sim then comes with the messages:
[snip]
Info: Simfactory command: simfactory/bin/../lib/sim.py "submit" "test-whisky-openmp" "--configuration" "test-whisky-openmp" "--parfile=whisky-openmp-test-ali.par" "--verbose" "--walltime=48:00:00" "--procs=16" "--ppn=4" "--num-threads=4" "--machine=damiana" "--queue=intel.q"
Info: Version 1331M The Simulation Factory: Manage Cactus simulations
Info: defs: /home/alibeck/programme/Cactus-Luca/Cactus/simfactory/etc/defs.ini
Info: defs.local: /home/alibeck/programme/Cactus-Luca/Cactus/simfactory/etc/defs.local.ini
Info: Cactus Directory: /home/alibeck/programme/Cactus-Luca/Cactus
Info: simenv.COMMAND: submit
Info: Executing command: submit
Info: Assigned restart_id of: 0002
Info: Found the following restart_ids: [0, 1]
Info: Maximum restart id determined to be: 0001 Assigned restart id: 2
Info: Simulation is inactive: submitting
Info: Job allocation information:
Info: System: nodes=170 cores/node=4 threads/process=4
Info: Requested: nodes=4 cores=16 cores/node=4
Info: Run: processes=4 threads=16 threads/process=4
Info: Distribution: processes/node=1 threads/node=4
Info: Ratio: threads/core=1.000 cores/thread=1.000
Info: writing to internalDir: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0002/SIMFACTORY
Info: saving substituted submitscript contents to: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0002/SIMFACTORY/SubmitScript
Executing submit command: qsub /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0002/SIMFACTORY/SubmitScript
Submit finished, job id is 259460
Info: Restart 2 is active
Info: Assigned restart_id of: 0003
Info: Found the following restart_ids: [0, 1, 2, 2]
Info: Maximum restart id determined to be: 0002 Assigned restart id: 3
Info: Simulation is active: presubmitting
Info: Job allocation information:
Info: System: nodes=170 cores/node=4 threads/process=4
Info: Requested: nodes=4 cores=16 cores/node=4
Info: Run: processes=4 threads=16 threads/process=4
Info: Distribution: processes/node=1 threads/node=4
Info: Ratio: threads/core=1.000 cores/thread=1.000
Info: writing to internalDir: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0003/SIMFACTORY
Info: saving substituted submitscript contents to: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0003/SIMFACTORY/SubmitScript
Executing submit command: qsub /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0003/SIMFACTORY/SubmitScript
Submit finished, job id is 259461
Info: Restart 2 is active
Info: Assigned restart_id of: 0004
Info: Found the following restart_ids: [0, 1, 2, 2, 3]
Info: Maximum restart id determined to be: 0003 Assigned restart id: 4
Info: Simulation is active: presubmitting
Info: Job allocation information:
Info: System: nodes=170 cores/node=4 threads/process=4
Info: Requested: nodes=4 cores=16 cores/node=4
Info: Run: processes=4 threads=16 threads/process=4
Info: Distribution: processes/node=1 threads/node=4
Info: Ratio: threads/core=1.000 cores/thread=1.000
Info: writing to internalDir: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0004/SIMFACTORY
Info: saving substituted submitscript contents to: /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0004/SIMFACTORY/SubmitScript
Executing submit command: qsub /lustre/AEI/alibeck/simulations/test-whisky-openmp/output-0004/SIMFACTORY/SubmitScript
Submit finished, job id is 259462
[snip]
As a result three jobs are queued.
[snip]
qstat job-ID prior name user state submit/start at queue slots ja-task-ID
259460 0.00000 test-whisk alibeck qw 04/29/2011 10:26:55 16
259461 0.00000 test-whisk alibeck hqw 04/29/2011 10:26:56 16
259462 0.00000 test-whisk alibeck hqw 04/29/2011 10:26:56 16
[snip]
What is going wrong here?
Keyword:
Comments (6)
-
-
- removed comment
Is this still a problem?
-
- changed status to resolved
- removed comment
Job-chaining ("presubmission") works fine on Datura.
-
- edited description
- changed status to closed
-
- edited description
- removed responsible_account_id
-
- edited description
- Log in to comment
It seems that this is presubmission. The wall time you requested was longer than the wall time limit, so SimFactory broke up your simulation into three pieces that will execute sequentially. You see this from the three lines
Info: Simulation is inactive: submitting Info: Simulation is active: presubmitting Info: Simulation is active: presubmitting
Notice that your initial submit command did not create a new simulation; instead, it restarted an existing simulation that had already two restarts.
I do not know how well-tested presubmission is on Damiana.