Handle file irregular names

Create issue
Issue #18 resolved
Feng Jianxing created an issue

Snakemake can use wildcards to handle many file names at the same time. What if the file names does not have simple pattern? If I have many files with irregular file names but the process of each file is similar and the output of each file is independent of each other. I don't want to write a rule for each file. Although I can write all files in a single rule, the whole rule has to be rerun if only one of the input file changes.

The idea situation is that there is overall rule that handle all files and there is another sub-rule that handle each individual file. However, I don't know how to implement this even with array. Is there a way to get around of this?

Comments (8)

  1. Johannes Köster

    I would say the best solution is to have a dictionary that translates sample id to the irregular filenames for you, and use a function (see the section "functions as input files" in the documentation) to provide the input file like this:

    FILENAME = dict(...)  # map sample id to the irregular filename here
      input: lambda wildcards: FILENAME[wildcards.sample]  # use a function as input to delegate to the correct filename
      output: "somefolder/{sample}.csv"
      shell: ...
  2. Feng Jianxing reporter

    Oh, beautiful solution, it works for me.

    I don't quite understand the part: lambda wildcards: FILENAME[wildcards.sample] Isn't 'wildcards' a keyword which denotes that 'sample' should be replaced. Why does 'wildcards' appear in the lambda parameter part?

  3. Johannes Köster

    wildcards is an object that contains one attribute for each wildcard, here wildcards.sample. wildcards.samplejust gives you the string value of the sample wildcard. Also see the wildcards section in the Documentation to see how you can use the wildcards object inside a shell command.

  4. Feng Jianxing reporter

    Thanks. I see. It is hard to figure out such ideas behind Snakemake and therefore hard to extend rules beyond the examples even after reading your related two papers.

  5. Feng Jianxing reporter

    One more thing, I have used your solution to return a list contains two files. When I update an upstream file and use 'snakemake -n', I found that rule: input: ##Here only the first element of the array shows up

    Any ideas?

  6. Johannes Köster

    Yes this was reported by Marcel in issue #16. It is already fixed. Can you fetch the current master and try again?

  7. Johannes Köster

    Ok, closing issue. Thanks, I think it is an important usecase and it is good to have an example in the FAQ now.

  8. Log in to comment