HTTPS SSH

seqsnake

Overview

Repository for organizing sequencing processing workflows using snakemake and conda.

Repo organization

Workflows are kept in the following directory structure:

  • workflows/{name}/{subname}/
  • Snakefile
  • environment.yaml
  • meta.yaml

Config files are kept in the configs file describing current samples. Note the test.yaml and test_paired.yaml files for testing workflows.


Testing Snakefile

source activate {env_name} * note, environment is in ~/miniconda3/envs/{env_name} * if using a workflow from other dev, need to create the env using conda env create -f {/path/to/environment.yaml}, which creates the given env in ~/miniconda3/envs/{env_name} snakemake -s {Snakefile} --configfile {config}


Sending a Job via Slurm

#!/bin/bash

#SBATCH -p general
#SBATCH -N 1
#SBATCH -n 1
#SBATCH --mail-user=robert.amezquita@yale.edu
#SBATCH --mail-type=ALL
#SBATCH -D /ycga-ba/data2/kaech/AdvSeq
#SBATCH -e /home/ra364/slurm/master-%j.err
#SBATCH -o /home/ra364/slurm/master-%j.out
#SBATCH -J atac-pe

cd /ycga-ba/data2/kaech/AdvSeq

snakemake \
-s /home/ra364/repos/seqsnake/dags/atac-pe-process.Snakefile \
--configfile /home/ra364/repos/seqsnake/configs/ATAC-seq_PAIRED.yaml \
--cluster "sbatch -D /ycga-ba/data2/kaech/AdvSeq -o /home/ra364/slurm/worker-%j.out -e /home/ra364/slurm/worker-%j.err -p general -N 1 -c 8 --mem=8000 --mem-per-cpu=1000 " \
--cores 8 --resources mem=8 \
--jobname "{rulename}.{jobid}" 

Then submit the job with: sbatch script.sbatch.


Templates

environment.yaml

conda create -n {environment_name} {package1} {package2} {...} source activate {environment_name} conda env export > {path/to/workflow}/environment.yaml source deactivate {environment_name}

meta.yaml

name: cutadapt-se
date: 2017-07-30
authors:
  - Robert A. Amezquita
description:
  Trim paired-end reads with cutadapt
input:
  - fastq1 file
  - fastq2 file
output:
  - trimmed fastq1  fastq2 files
  - text file containing trimming stats

Snakefile

rule all:
    input:
        expand("trimmed/{sample}_{mate}.fastq.gz", sample=config["samples"], mate=[1,2])

rule cutadapt:
    input:
        ["fastq/{sample}_1.fastq.gz", "fastq/{sample}_2.fastq.gz"]
    output:
        fastq1="trimmed/{sample}_1.fastq.gz",
        fastq2="trimmed/{sample}_2.fastq.gz",
        qc="trimmed/{sample}.qc.txt"
    params:
        "-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC ",
        "-A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT ",
        "-q 20"
    log:
        "logs/cutadapt/{sample}.log"
    shell:
        "cutadapt"
        " {params}"
        " -o {output.fastq1}"
        " -p {output.fastq2}"
        " {input}"
        " > {output.qc}"