Quickstatements batch running list keeps getting shorter

Issue #109 new
Arthur Smith created an issue

Last week I noticed there were up to 16 simultaneously running Quickstatements batches. The last few days that had dropped down to 9 - there were usually many waiting in “Running” state, but only 9 that showed at least one “RUN” entry. As of right now, that 9 has dropped down to just 8 running batch jobs. It looks like the problem may be stopped or DONE batches that still have an entry labeled as “RUN” (but it’s not running, clearly - nothing is happening or changing with those batches). The changer from 9 to 8 seems to have coincided with one of my batches hitting that state - #12982 - it’s currently stopped, but it still has one entry labeled “RUN”. So I’m guessing something is counting the total number of “RUN” entries to determine when to allow another batch job to start, but occasionally those are getting hung up?

Anyway, an interim fix is probably to reset those bogus “RUN” entries somehow. Longer term maybe anything in “RUN” state that’s older than an hour should be ignored/reset automatically?

Comments (9)

  1. Arthur Smith reporter

    I restarted #12982 for now, to let it finish the items in “INIT” state. However you’ll notice (if you see this while it’s still running over the next few hours) that now it lists 2 entries as in “RUN” state, rather than just 1.

  2. Arthur Smith reporter

    Batch #12982 is back in “STOP” state, but still shows 1 entry with status “RUN”.

  3. Arthur Smith reporter

    Here’s an image of my current “Last Batches” screen - #12982 hasn’t changed in 5 hours, but it lists a “RUN: 1” item.

  4. Arthur Smith reporter

    The running list has dropped down to 7 as of a little earlier today, I think the batch run backlog is heading towards 24 hours (generally at least 15-20 waiting to run). Batch #13180 appears to be the latest culprit - it shows the same “RUN: 2” problem generally right now (when it’s finished in a few hours it’ll be “DONE” but with a “RUN: 1” entry).

  5. Arthur Smith reporter

    Now the running list is down to just 4 (FOUR!) batches. At least 30 are queued up waiting to run - max running batch ID is 13288 but there’s one queued with ID 13318. I think the latest problem was that somehow one of my jobs acquired 4 extra “RUN”-state entries: #13284

  6. Arthur Smith reporter

    running list is back up to 7 batches; somehow the extra “RUN” entries on 13284 cleared up.

  7. Arthur Smith reporter

    Now back up to 8 - I’ll lower this from critical as at this level I think it can roughly keep up with the backlog. But if there’s supposed to be up to 16 simultaneous, it’s still only running at half capacity.

  8. Arthur Smith reporter

    I think it’s reached beyond critical level this morning - only ONE job is running in the queue at the moment, dozens are listed as “Running” but have not updated in 2 hours or more!

  9. Arthur Smith reporter

    And now nothing is running - at least last I could see. Now I can’t get to the last batches page at all!?

  10. Log in to comment