I submitted a simulation 'sim submit' but the corresponding qsub failed due to wrong numbers of procs/node (philip cluster). I changed the number given on the command line and did a 'sim submit' again, this time successful. Several things happend which I think could be done better:
- the unsuccessful qsub was not detected during the new submit - it attempted a restart and didn't simply clean the unsuccessful submit
- when trying the restart, it went ahead and queued the job, but this later failed when run with "cannot rerun a restart that has been finished". This could have been caught earlier - without the wait time in the queue.