FYI: -once flag added to crontab entries for "next" and "just_added" tasks
While investigating issues with the Toolforge grid engine I noticed that your glamtools tool had a large number of jobs stuck in "qw" (queue wait) state. Investigation showed that cron was attempting to start a new "next" job every 2 minutes. Because each instance of this job is taking more than 2 minutes to complete eventually the tool's quota for concurrent jobs is filled which then causes "qw" jobs. The "qw" backlog eventually grows to a point where the tool's quota for queued jobs is full as well. At this point enqueuing new jobs starts to fail.
By adding the -once
flag to the job I have in practice accelerated the time to enqueuing failure to be more immediate. When no other "next" jobs are running a new one will start. As long as one is running attempts by cron to add more will fail with a notice that the named job is already running.
I hope that this does not harm the operation of this tool. My intent was only to reduce the load on the grid engine and pressure on the scheduler.
I (wikitech:User:BryanDavis) created this report but didn't notice that I was logged out until after submitting.