Check timestamp of file which is used to create temp files

Create issue
Issue #1160 new
Former user created an issue

I have a few workflows in which I have a rule that is quick to run but makes a large intermediate file that is then processed by a slow, resource heavy step. e.g.:

rule fast_cheap_step:
    input: '{sample}.txt'
    output: temp('{sample}.big_temp_file.txt')

rule slow_expensive_step:
    input: '{sample}.big_temp_file.txt'
    output: '{sample}.processed.txt'

rule all:
    input: ['a.processed.txt', 'b.processed.txt', 'c.processed.txt']

Each time I rerun snakemake I'd like it to regenerate the intermediate files since they are quick to generate and I don't want them taking up lots of space. However, I don't want to run the slow_expensive_step again where I don't have to.

In this case, if I run the pipeline to completion and then modify 'c.txt' and then run it again: fast_cheap_step and slow_expensive_step is run for all 3 samples.

It would be really useful if rules that have an input that is a temporary file look at the timestamp of the files that are required to generate the temporary file to determine if they need to be rerun.

i.e. If c.processed.txt is older than c.txt it needs to be rerun but otherwise it does not.

Comments (0)

  1. Log in to comment