Error concerning missing -V option when submitting job on LoneStar

Create issue
Issue #645 closed
Ian Hinder created an issue

I used the following command to submit an ET testsuite job on LoneStar. The intention is to run on 1 process with 6 threads.

sim --remote lonestar create-submit maxwell_1proc_2 --testsuite --procs 6 --num-threads 6 --walltime 4:00:00 --ppn-used 6

and got a weird error. SimFactory didn't report any error or return a nonzero exit code, even though it was unable to determine a job ID. I repeated the submission, and the second time it worked, so the fault appears to be intermittent. There was no difference in the submit script in each case, apart from the job name.

The log file is attached, but the final error message from the log file is:

[LOG:2011-10-22 11:29:05] self.submit(submitScript)::Executing submission command: qsub /scratch/00915/hinder/simulations/maxwell_1proc/output-0000/SIMFACTORY/SubmitScript
[LOG:2011-10-22 11:29:05] self.makeActive()::Simulation maxwell_1proc with restart-id 0 has been made active
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::received raw output: Unable to run job: JSV rejected job.
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::Exiting.
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::-----------------------------------------------------------------
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::-- Welcome to the Lonestar4 Westmere/QDR IB Linux Cluster --
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::-----------------------------------------------------------------
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::--> Checking that you specified -V...
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::--------------------------> Rejecting job <--------------------------
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::-V is now a required option. Please specify it in your submit script.
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::---------------------------------------------------------------------
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::
[LOG:2011-10-22 11:29:06] job_id = self.extractJobId(output)::using submitRegex: Your job (\d+) \(.*?\) has been submitted
[LOG:2011-10-22 11:29:06] self.submit(submitScript)::After searching raw output, it was determined that the job_id is: -1
[LOG:2011-10-22 11:29:06] self.submit(submitScript)::If this is -1, that means the regex did NOT match anything. No job_id means no control.

Full log.txt file is attached. The job was not submitted.

The weird thing is that I do have -V in my submission script. The file /scratch/00915/hinder/simulations/maxwell_1proc/output-0000/SIMFACTORY/SubmitScript has

#! /bin/bash
#$ -A TG-MCA02N014
#$ -q normal
#$ -r n
#$ -l h_rt=4:00:00
#$ -pe 1way 12
#$ 
#$ -V
#$ -N maxwell_1proc-0
#$ -M ian.hinder@aei.mpg.de
#$ -m abe
#$ -o /scratch/00915/hinder/simulations/maxwell_1proc/output-0000/maxwell_1proc.out
#$ -e /scratch/00915/hinder/simulations/maxwell_1proc/output-0000/maxwell_1proc.err
cd /work/00915/hinder/Cactus/EinsteinToolkit
/work/00915/hinder/Cactus/EinsteinToolkit/simfactory/bin/sim run maxwell_1proc --machine=lonestar --restart-id=0

Any ideas?

Keyword:

Comments (5)

  1. Ian Hinder reporter
    • removed comment

    I have not used LoneStar since. It was not reproducible when it happened, since it worked the second time.

  2. Log in to comment