Add complex conditional file dependency

Create issue
Issue #37 wontfix
Feng Jianxing created an issue

Currently, each rule depends on all the existence of input files, which means that the input files have 'and' relationship. Is it possible to support the following case:

'fileA' and 'fileB' or 'fileC' and 'fileD'

If would be useful if two different types of input files are equivalent. For example, '.fastq' and '.fq'

Furthermore, is it possible to add logic flow control in the rules? For example, add another type of rules which support logic control flow. Well, I don't know how to combine this with the current philosophy nicely.

Comments (14)

  1. Johannes Köster

    Hi, sorry for the delay. I had already composed an answer some days ago, but somehow I seem to have closed my browser before submission. Regarding your requests:

    The first:

    This is already possible by using a function as input file. The function can decide either by the wildcards object or by looking at what files are present which input file it should return ( see the corresponding section in the docs). With the new parser in current master, it is also possible to write something like this and let snakemake figure out that some files are not present:

    for ext in "fastq fq".split():
        rule:
            input:   expand("{{sample}}.{ext}", ext=ext)
            output: ...
            shell:   "somecommand {input} {output}"
    

    The second:

    To some extend it is possible with the new parser. You can put rules in conditional statements. These are evaluated before the workflow is executed though, so they cannot be data dependent. For dynamic updates of the DAG during your workflow you might have a look at the dynamic file support of Snakemake. This was not intended for logic flow though, so it might be too limited for your purpose. If you have a good example, I will evaluate if it is possible to support that with Snakemake

  2. Johannes Köster

    I will close this now. The current funtionality should be sufficient for most cases. I should further add that using functions as input files allows to handle alternative filetypes. Please reopen if you disagree. Having true conditional nodes in the workflow is out of scope in Snakemake. I personally would implement something like that rather inside a rule or as a script.

  3. rioualen

    Hello,

    " You can put rules in conditional statements" -> Is it possible to put conditional statements into rules ?

    For example, when dealing with single or paired ends data, I'm currently using separate rules such as :

    rule bowtie2_pe:
        input:
            forward = "{reads}_R1.fastq", \
            reverse = "{reads}_R2.fastq", \
            index = bowtie2_index
        output:
            sam = "{reads}_bowtie2.sam"
        shell: "bowtie2 -x {input.index} -1 {input.forward} -2 {input.reverse} -S {output.sam} "
    
    rule bowtie2_se:
        input:
            reads = "{reads}.fastq", \
            index = bowtie2_index
        output:
            sam = "{reads}_bowtie2.sam",
        shell: "bowtie2 -x {input.index} -U {input.reads} -S {output.sam}"
    

    Would it be possible to combine those in some way? For example:

    rule bowtie2:
        if (seq_type == "pe"):
            input:
                forward = "{reads}_R1.fastq", \
                reverse = "{reads}_R2.fastq", \
                index = bowtie2_index
        else if (seq_type == "se"):
            input:
                reads = "{reads}.fastq", \
                index = bowtie2_index
        output:
            sam = "{reads}_bowtie2.sam"
        if (seq_type == "pe"):
            shell: "bowtie2 -x {input.index} -1 {input.forward} -2 {input.reverse} -S {output.sam}"
        else if (seq_type == "se"):
            shell: "bowtie2 -x {input.index} -U {input.reads} -S {output.sam}"
    

    I have currently a set of rules where 1 rule = 1 file, and I use other keywords such as params, benchmarks, log... So I have have a lot of duplicated code, and I can't think of a good solution to handle that.

    Thank you, and sorry for extracting such an old topic.

  4. Johannes Köster

    Hi, this is already possible via input functions. Basically, you have a function that returns input files based on the wildcard values (see here and here).

  5. rioualen

    Thank you I'll look into it. But what if the shell command slightly differs as well ? For example, if the params change or the algorithm is not the same ?

  6. Johannes Köster

    Then you change the shell: into run: and move the if into the run block. From there, you can invoke shell as a function.

  7. rioualen

    I'm working on it, but I can't think of a way to manage wildcards without inferring them from the output files unfortunately. If I have to redefine the wildcards myself it's gonna complicate my code too much I'm afraid.

    Thank you for your time.

    def bowtie2_inputs(wildcards):
        if (seq_type == "pe"):
            return expand("{reads}_{strand}.fastq", strand=["R1", "R2"], reads=?)
        elif (seq_type == "se"):
            return expand("{reads}.fastq", reads=?)
    
    rule bowtie2:
        input:
            reads = bowtie2_inputs, \
            index = bowtie2_index
        output:
            sam = "{reads}_bowtie2.sam"
        run:
            if (seq_type == "pe"):
                os.system("bowtie2 -x {input.index} -1 {input.reads[0]} -2 {input.reads[0]} -S {output.sam}")
            elif (seq_type == "se"):
                os.system("bowtie2 -x {input.index} -U {input.reads} -S {output.sam}")
    

  8. Johannes Köster
    def bowtie2_inputs(wildcards):
        if (seq_type == "pe"):
            return expand("{reads}_{strand}.fastq", strand=["R1", "R2"], reads=wildcards.reads)
        elif (seq_type == "se"):
            return expand("{reads}.fastq", reads=wildcards.reads)
    
    rule bowtie2:
        input:
            bowtie2_inputs,
            index=bowtie2_index
        output:
            sam="{reads}_bowtie2.sam"
        run:
            if seq_type == "pe":
                shell("bowtie2 -x {input.index} -1 {input.forward} -2 {input.reverse} -S {output.sam}")
            elif seq_type == "se":
                shell("bowtie2 -x {input.index} -U {input.reads} -S {output.sam}")
    
  9. Kyle Beauchamp

    I also found this useful and nonobvious. I sometimes wish i had a list of the most "pythonic" way to do various tricks in snakemake.

  10. Assa Yeroslaviz

    I know this is an early post, but I was wondering if it is still active. I have the same roblem, this time with the star alignment. I would like to try this solution as well.

    What I’m not sure here is, where did you get the {input.forward} and {input.reverse} parameters in the last rule_bowtie2?

    thanks

  11. rioualen

    Hi,

    The code I posted was incomplete, I updated it so it should work (check out the input and run statements). Happy coding :)

  12. Log in to comment