Allow dynamic wildcards to be constrained

Create issue
Issue #774 resolved
Mitchell Vollger created an issue

I would like if it was possible to use wildcard constraints with dynamic.

The reason I would like this is because without wildcard constraints you cannot prevent dynamic from checking sub-directories for matching strings. This can make the WDAG step incredible slow if you have a sub-directory with tens of thousands of files because dynamic will try to match them all.

In addition if dynamic does find a match within those sub directories, it will delete that file if the dependencies of the dynamic rule update. This is the described behavior but it means you have to make sure that nothing in sub directories could possibly match the expression you use with dynamic which can be difficult.

For example if you are trying to match dynamic( mydir/{idx}.txt) you cannot use any files in any sub directory with the extension .txt or else dynamic will match mydir/{any_sub_dir/any_file}.txt

Here is a test script that I have been using, that reproduces the behavior I described:

localrules: all, done

rule all:
        input:
                done="done"

rule start:
        input:
                start="start"
        output:
                split = dynamic("lots/{idx}.txt")
        params:
                cluster="",
        run:
                for idx in range(10):
                        shell("touch lots/{}.txt".format(idx))


rule testrule:
        input:
                #x=rules.start.output.split,
                x="lots/{idx}.txt",
        output:
                y="drule/{idx}.txt",
        params:
                cluster="",
        shell:
                """
                touch {output.y}
                """

rule done:
        input:
                dynamic(rules.testrule.output.y),
        output:
                "done"
        shell:
                """
                touch done
                """

So if I run this once it works as expected. However, if after I run it again after I create a new file and directory using touch lots/subdir/readme.txt and touching start so it will re run (touch start) it will delete readme.txt from the sub directory, and then proceed.

My interpretation of this is that dynamic is matching readme.txt, and then deleting it because its dependency has been updated.

I propose that dynamic be updated such that it can use wildcard constraints. Or even just adding a glob_pattern flag to dynamic so it uses a glob pattern to match files instead of a regex pattern.

Thanks!

My snakemake version is current: 4.6.0

Comments (2)

  1. Johannes Köster

    Issues with dynamic are resolved via the introduction of checkpoints, a new, more powerful and more robust concept. See here for documentation. In particular consider the second example, which shows how to implement dynamic-like functionality with checkpoints.

    Dynamic itself will be deprecated in a future release, because it never escaped experimental state and is much less powerful than checkpoints. I hope this does not cause you too much trouble.

  2. Log in to comment