- removed comment
simfactory does not abort --testsuite submission process if rsync fails
when setting up testsuite runs simfactory uses rsync to copy the test suite data into the simulation folder. If this rsync fails (eg. because a user specified incorrect rsyncopts in defs.local.ini) the submission process does not abort and instead submits an emtpy test-suite run.
rhaas@kraken-gsi2:~/ET_trunk> sim create-submit 2p6t --procs 12 --num-threads 6 --walltime 4:0:0 --tests
uite --allocation TG-ASC120003
Skeleton Created
Job directory: "/lustre/scratch/rhaas/simulations/2p6t"
Option --testsuite given
Executable: "/nics/c/home/rhaas/ET_trunk/exe/cactus_sim"
Option list: "/lustre/scratch/rhaas/simulations/2p6t/SIMFACTORY/cfg/OptionList"
Submit script: "/lustre/scratch/rhaas/simulations/2p6t/SIMFACTORY/run/SubmitScript"
Run script: "/lustre/scratch/rhaas/simulations/2p6t/SIMFACTORY/run/RunScript"
Assigned restart id: 0
Copying testsuite data
rsync: --times=no: option does not take an argument
rsync error: syntax or usage error (code 1) at main.c(1435) [client=3.0.9]
Executing submit command: /opt/torque/2.5.7/bin/qsub /lustre/scratch/rhaas/simulations/2p6t/output-0000/SIMFACTORY/SubmitScript
Submit finished, job id is 3236567.nid00016
rhaas@kraken-gsi2:~/ET_trunk> qdel 3236567.nid00016
My rsynopts were:
rsyncopts = --times=no --checksum --include 'configs/*/ThornList' --exclude 'configs/*/*'
which are bad for two reasons: 1.) kraken's rsync does not no --times-no (likely wants --notimes or so) 2.) --exclude 'configs//' excludes cctk_MPI.h which is used by the test suite infrastructure to detect the presence of MPI
Note that some of these options are obviously obsolete now that simfactory defaults to --times=no --checksum anyway.
Still, simfactory should always check the exit status of any command it calls I think.
Keyword:
Comments (4)
-
-
- removed comment
Why are the "rsyncopts" being used for copying test data? I wouldn't have thought they were relevant there. rsyncopts is used for "sim sync", which is a very different type of operation.
-
reporter Bumping major as this prevents me from running the tests on bluewaters. rsync 3.0.9 seems to not like
--times=no
and aborts with:rsync: --times=no: option does not take an argument Command returned exit status 1 Error: Rsync of test data for simulation failed Aborting Simfactory.
So actually this ticket is now fixed but I still cannot run the tests.
-
reporter - changed status to resolved
- removed comment
- Log in to comment
I just wanted to clarify (as I was confused initially) that Roland is saying that these rsyncopts are bad, not that the simfactory rsync options for tests are bad. SimFactory explicitly includes the required cctki_MPI.h file. The bug is that simfactory uses simlib.ExecuteCommand without checking its return status. For some reason, return codes are being used to indicate errors, even though the language has exceptions for this purpose.
The attached (untested) patch should solve the problem, assuming I didn't make any typos! Does it work?