Allow "local" jobs when submitting to a cluster

Sean Davis avatarSean Davis created an issue

Sometimes all rules are appropriate for execution in a cluster environment, justifying the overhead and expense of creating jobs, waiting for them in the queue, etc. However, one or more rules are more appropriate for running locally (soft-linking files, creating simple reports, etc.). It would be a convenience to have the ability to run a mixture of local and cluster jobs and to specify the "local" jobs in the rule, perhaps as a boolean defaulting to False. Behavior in the non-cluster environment would be unchanged and unaffected by the "local" setting.

Comments (10)

  1. Johannes Köster

    Hi Sean and Ino. Here come the first fruits of my NIH cluster account. There is now a new global keyword in Snakefiles:

    localrules: all, foo
    rule all:
          input: ...
    rule foo:
    rule bar:

    The keyword lists all rules that shall not be submitted to the cluster. Above, only jobs from the rule bar will be submitted to the cluster. At the moment, this is only used if you don't specify --immediate-submit. It is implemented in the branch cluster-improvements.

    Feel free to test.

  2. inodb

    That's great! Thanks! If you have time left, I could use this in the --immediate-submit as well. Now I just schedule a really short job, which is not the best solution.

  3. Johannes Köster

    Hi Ino, I have thought about the combination with --immediate-submit. I don't think that it is possible without ugly hacks that I want to avoid. At the moment, --immediate-submit means that Snakemake delegates the dependency resolution to you, or the cluster framework, respectively. For a local job though, Snakemake would have to determine the correct point in time to execute it. This however, it cannot know any more since it gave the control over the dependencies away. Bottom line is: with immediate submit, your submission script has to decide what is local. However, this is easy now, since I have changed the way job properties are denoted in the jobscript:

    • Properties like threads, rule, input, output, etc. are now encoded as json.
    • The module snakemake.utils now provides a function read_job_properties(jobscript), that returns a dictionary with the entries rule, local, input, output, threads, resources. Hence, your submission wrapper can decide to run a script locally based on the local entry. However, you will have to decide yourself about the correct timepoint for this execution or course.
  4. inodb

    Hmm, yes I see the problem. Makes sense to leave that to the submission script. Though since everything is scheduled in one go with --immediate-submit, I don't see how you would launch a local job in-between some other scheduled jobs, unless you wait with scheduling until certain jobs are done (which kinda beats the purpose of --immediate-submit) or launch some separate process that polls whether a certain scheduled job is done before running the local job (I have my doubts whether this is faster / less computationally intensive than just scheduling a separate short job). The ideal case would be I think to have these local jobs run as part of the bigger jobs that come after, unless there is none coming after in which case the local snakemake process runs it. The job submmision script could just ignore the local jobs when scheduling, but I can imagine that a problem occurs with multiple parallel jobs that then all try to create the same output file from the local job that they are dependent on. Right? If that's the case, it probably is less hassle to just schedule a really short job.

  5. Johannes Köster

    Yes, you precisely got the point. And indeed, the scenario you describe at the bottom would cause problems. Either a lock error or two jobs that try to write the same file, depending on your parameters.

  6. Log in to comment
Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.