tests

Issue #1 resolved
Ryan Dale created an issue

Hi Johannes -

I really like this idea of a repo of wrappers that can be individually documented and maintained. Before using other peoples' wrappers in production though, it would be nice to know that they have been tested in some way. Do you have any ideas for testing wrappers on e.g. travis-ci?

I think the biggest challenge is example data. For example, testing a wrapper for an aligner would require a prepared index for an aligner. Would this be created on-the-fly by a test (downloading the FASTA from e.g. Ensembl), or be downloaded, already prepared, directly from some central example data location? If the latter, where would it be stored? More generally, what is the minimal set of example data that can be re-used by the maximum number of wrappers?

Once that's solved, tests can be set up to run only on those wrappers that have been updated to avoid exceeding CI resources.

Anyway, I just wanted to get the conversation started about this to kick around ideas, and of course I would be happy to contribute.

Comments (6)

  1. Johannes Köster

    Yes, we definitely need some kind of testing framework. I think it would be best to start with a tiny snakefile in each wrapper directory, along with really small test data (e.g. the fastq and sam files from the htslib repository). Then, a simple nosetest script like we have it in the Snakemake workflow repository can run all these snakefiles automatically.

  2. bow

    Hi everyone,

    I've only been recently trying to dig deeper into snakemake but I just want to say this is also what I am interested in having.

    I'm wondering though, since these are tests for wrappers and not tests for the actual tools themselves, is it maybe enough in some cases to just test for the actual shell strings? In other words, given a rule that uses the wrapper in a Snakefile, what is the resulting shell string. It would avoid maintaining a extra datasets just for testing.

    Of course, testing actual runs of the tools being wrapped would be better. However, there may be cases where setting up such test dataset means too much overhead. Some tools may also require a certain size of input in order to work properly. In those cases, the minimum should then be to have shell string tests.

    What do you think?

  3. Ryan Dale reporter

    I agree we definitely don't want to replicate unit tests for each tool, but I think it is important to test the output. Shell string tests will work for a subset of wrappers, but not all wrappers will even use a shell call, and some might use several. Some wrappers may do complex things that need to be tested.

    Tools that require a certain size input will be a problem . . . maybe we need to rely on external data from SRA for those?

    As sort of a sketch . . . I'm picturing some sort of "driver" Snakefile that sets up minimal data for each test case and then include:s a wrapper's test Snakefile supplied from the commandline via --config=wrapper=path/to/wrapper

    # include the wrapper's test snakefile. Test snakefile is required to have
    # a dictionary of results, something like:
    # {'sample.bam.bai': <md5sum here>}
    import os
    include: os.path.join(config['wrapper'], 'Snakefile')
    
    setup = ['R1.fastq.gz', 'R2.fastq.gz', 'sample.bam', 'genome.fa']
    
    rule all:
        input: setup + results.keys()
    
    # data downloaded from htslib or maybe the wrappers repo
    rule fastqs:
        output: 'R1.fastq.gz', 'R2.fastq.gz'
    
    rule bams:
        output: 'sample.bam'
    
    rule fasta:
        output: "genome.fa"
    
    onsuccess:
       # check against configured md5sums
        for k, v in results.items():
            assert md5sum(open(k)) == v 
    

    As long as the data available for testing is documented, wrappers can expect that data to exist and therefore can have minimal boilerplate. A samtools test snakefile in bio/samtools/index could then look like:

    results = {'sample.bam.bai': <md5sum here>}
    
    rule samtools_index:
        input: "sample.bam"  # set up by the driver snakefile
        output: "sample.bam.bai"
        wrapper:
            'file://path/to/wrapper'
    
  4. Patrik Smeds

    We are evaluating a few different tools for our new pipeline and I have a few questions regarding validation possibilities with snakemake-wrappers.

    Has there been any progress in implementing test that validates the outputed result with md5sum or any plans made for implementing it?

    I also have question about the existing tests. Right now, it seems to me that all tests are made against the master (for bwa mem, "master/bio/bwa/mem"). Wouldn't that be a problem if I want to use a specific version, for example the latest tagged version, and also test that version and not the master version.
    Could the test for bwa mem be changed from

    rule bwa_mem:
        input:
            ["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
        output:
            "mapped/{sample}.bam"
        log:
            "logs/bwa_mem/{sample}.log"
        params:
            index="genome",
            extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
            sort="none",             # Can be 'none', 'samtools' or 'picard'.
            sort_order="queryname",  # Can be 'queryname' or 'coordinate'.
            sort_extra=""            # Extra args for samtools/picard.
        threads: 8
        wrapper:
            "master/bio/bwa/mem"
    

    to

    rule bwa_mem:
        input:
            ["reads/{sample}.1.fastq", "reads/{sample}.2.fastq"]
        output:
            "mapped/{sample}.bam"
        log:
            "logs/bwa_mem/{sample}.log"
        params:
            index="genome",
            extra=r"-R '@RG\tID:{sample}\tSM:{sample}'",
            sort="none",             # Can be 'none', 'samtools' or 'picard'.
            sort_order="queryname",  # Can be 'queryname' or 'coordinate'.
            sort_extra=""            # Extra args for samtools/picard.
        threads: 8
        wrapper:
            "file:.."
    
  5. Johannes Köster

    So far, there are no md5sum checks, but they can easily be added by modifying the test.py. For example the meta.yaml can contain a list of filenames and checksums, and this can be checked in the run function in test.py.

    Regarding the master, it is just a placeholder. For the docs, it is replaced by the version to build, for tests, a local prefix is added such that nothing is pulled from git, but rather the version we have currently checked out is tested.

  6. Log in to comment