list-simulations displays job ID when it doesn't make any sense

Create issue
Issue #508 open
Ian Hinder created an issue

The list-simulations command displays a restart number and job ID even for simulations for which there is no job in the queue. The attached patch omits these in that case.


Comments (7)

  1. Erik Schnetter
    • removed comment

    Each active simulation has one restart that is active; this restart id should be output when the simulation is active.

    If a restart is active, the job id should also be output, even if the job is not in the queue any more. For example, several job queuing systems keep job ids around even after a job has finished (still listed with "qstat" but in a "D" state), or create debug/output files containing the job id, or send emails where the job id is in the subject.

    Knowing the job id lets people interact with the queuing system without simfactory; people are used to this, so it's nice to know the job id.

  2. Ian Hinder reporter
    • removed comment

    I disagree with what Erik said in comment 2. SimFactory is supposed to create an abstraction over these details. Users who want to break the abstraction can easily go and delve around in the simulation output directory or use qstat to find the job id. Similarly for the restart id.

    As a user, what I want to see from list-simulations is:

    • A list of the simulations on the machine
    • Whether a simulation is active or inactive
    • Whether the simulation is running or not
    • How much longer it will run for in walltime (the total walltime of all queued restarts) [this is not currently available, and requires logging in to the machine and running qstat]

    I don't care about the details of individual restarts or job ids. Perhaps those could be output with a --all-details option or something.

    Erik: if I haven't convinced you, just close the ticket.

  3. Erik Schnetter
    • removed comment

    You have.

    One problem that I often encounter is that there are queued and held jobs left in the queue. These presumably come from faulty submit scripts or faulty handling of presubmission, but this is a real problem -- submit scripts will always be a bit dodgy, in particular if machines change, or if someone prepares a submit script for a new machine. Simfactory should tell the user about such jobs; maybe Simfactory should even check whether these jobs "look like" a Simfactory job, and if so, warn about these.

    Basically, if we don't output the job id any more, then Simfactory needs to be able to do the most important tasks people currently do via qstat.

  4. Ian Hinder reporter
    • removed comment

    Good idea! Maybe when simfactory runs qstat, it should do this check (whether there are jobs that look like simfactory jobs) and report as a warning that there are "orphaned" jobs that simfactory is not managing. There could then be a command to "clean up" any orphaned simfactory jobs.

  5. Frank Löffler
    • changed status to open
    • removed comment

    I assume this now goes beyond the proposed patch - removing the 'review'.

  6. Log in to comment