Issue #74 resolved

snakemake uses an extremely high amount of memory when it throws an exception

Anonymous created an issue

I have a pipeline that I'm trying to call on ~2000 samples. The pipeline is pretty straightforward and can be reduced down to about 5 steps to recreate this issue. The problem is that when one of the steps throws an exception (file missing or something) snakemake takes an extremely long time to start printing the errors and will start eating up gigs and gigs of memory. A very small pipeline that's set to run on 2000 samples will sit and not give any output and work its way up to about 20Gb of memory usage before starting to output any reasonable output.

I love using snakemake but I'm wondering how well it's supposed to scale for a simple pipeline on say 2000 individuals?

We've developed a number of pipelines with a handful of individuals but, now as we try to scale we're having more and more trouble getting snakemake to behave. Any help would be appreciated. Thank you!

Comments (11)

  1. Johannes Köster repo owner

    Hi, thanks for reporting. I personally did not yet create such a big workflow. Can you post the output of a dryrun?

    Further, since commit f97aab1, Snakemake provides a --profile option (it requires yappi to be installed). Could you profile your workflow with this and post the output? Then it will be easier to find the bottleneck.

    Best, Johannes

  2. Johannes Köster repo owner

    I have made several smaller performance improvements in the current master. Can you report if the problem still persists? Does it appear when running snakemake on cluster or locally?

  3. Peter Sudmant

    OK, I'll upgrade and give this a try!

    Thanks so much for your quick response, sorry, I haven't had a chance to dive into this, but, I'll try FIRST to profile and then to try the upgrades.

    Thanks so much!

  4. Peter Sudmant

    This is great - snakemake seems to be using about 25% less memory. That said, spawning a job with 11000 steps needs about 14.8GB of memory on the head node (ie, whatever machine is running the snakemake process). I guess this is to be expected? Things do certainly seem a bit snappier though.

  5. Johannes Köster repo owner

    Thats good news. I just merged another branch that should reduce the recursion depth in certain cases. Regarding the memory, I will probably run some profiling, but a such large workflow certainly needs a lot of objects. Might be that this cannot be reduced that much.

  6. Peter Sudmant

    Is this branch merged onto the master? I'll give it a try.

    One issue where I think some of the recursion might be getting into trouble is when somewhere deep down in the tree an exception is thrown (on large trees). Then, the memory just starts to explode and things start to crawl as the errors start to bubble back up. It would be worthwhile to try to profile this and see if you could exit early in these cases? I'm not sure if that makes sense...

  7. Johannes Köster repo owner

    I think I fixed the reason for the extreme memory usage in case of an exception. The fix is implemented in the branch dag_fail_early in commit 1050ab1. It would be great if you could give it a try and report if it fixes your problem.

    Thanks, Johannes

  8. Johannes Köster repo owner

    Hi, I recently merged that branch into the master and it does not cause any problems so far. I am quite sure that it should have fixed your problem. Please just reopen the bug if the problem persists with current master.

    Thanks again for reporting, Johannes

  9. Log in to comment