Wiki

Clone wiki

ATLAS-Pipeline / Home

###### !! OUR WIKI HAS MOVED !! ###### ######

Starting with commit 9ef31db, please refer to https://atlaswiki.netlify.app/pipeline.html

###### ####### ####### ####### ###### ######

For older commits, this wiki page stays online.

Welcome to the ATLAS-Pipeline,

A pipeline for ancient and modern Low-Depth DNA Analysis using the tool ATLAS!

ATLAS-Pipeline will easily convert all of your fastq-files into bamfiles in parallel, perform local realignment, recalibrate your base quality scores and correct your data for Post-Mortem Damage (PMD) by combining the most standard tools into one pipeline while keeping high flexibility. Further you can directly call variants, create vcf/glf files or estimate the heterozygosity of your samples.

Fastq-Files --> Bamfiles --> vcf/glf/θ

Requirements

  • ATLAS-Pipeline runs on a Linux-based machine. A cluster-support for SLURM clusters is included.
  • You need the program ATLAS to be installed on your local machine.
  • For running Rhea (local InDel-realignment), you need a valid GATK license on your machine

To ensure data continuity, ATLAS-Pipeline works best in a conda environment.
A suggested environment setup is provided within the repository (environment_5.yaml). If you prefer to work with locally installed programs here is a list of the packages and versions used throughout this pipeline:

    - bcftools=1.9
    - bwa=0.7.17
    - fastqc=0.11.8
    - gatk=3.8
    - graphviz=2.40.1
    - picard=2.21.1
    - python=3.6
    - pyyaml=5.1.2
    - rpy2=2.9.4
    - samtools=1.9
    - snakemake=5.4.4
    - trim-galore=0.6.4

How to run the ATLAS-Pipeline

You can download the repository to the location on your computer where the analysis should be executed by typing

git clone git@bitbucket.org:wegmannlab/atlas-pipeline.bam.git

General command for execution:

bash Atlas-Pipeline.sh -f [configfile.yaml] [options]
Be aware, that if you are not running the pipeline on a slurm cluster system, all log-output will be printed to your stdout. To redirect to a logfile please append &> logs/[logname].txt to your command.

Hint: if you have changed something in your config-files and want to run one part of the pipeline again, delete the "wrong" output-files and the corresponding summaries (for example if you want to run GLF with different parameters, run rm Results/4.Pallas/GLF/* and Results/4.Pallas/summaries/*)

You can find all options available here or with bash Atlas-Pipeline.sh -h

Config-File

To run the ATLAS-Pipeline, you need to provide a config-file. Here you specify all major information, input-files and thresholds needed for your project. You can find example-config-files in 'example_files/example.config.*' and on the wiki pages of each module.

Overview

The complete ATLAS-Pipeline workflow is split in 4 major parts. Find out more by following the links to:

  • Gaia -- Genome Wide Alignment Including Adapter-trimming
    from your sequencing results in fastq format to aligned bamfiles

  • Rhea -- Local InDel-Realignment
    locally realign alongside known InDels and a dataset from your population of interest

  • Perses --Post-Mortem-Damage and Error Rate Estimation for Sequence Data
    using ATLAS to merge paired-end reads, split single-end reads, and produce PMD and recal files for further analysis

  • Pallas -- Population ALLele-frequency AnalysiS
    produce vcf- and glf-files, estimate heterozygosity and (for mammals) the sex of your individuals.

Disclaimer

ATLAS-Pipeline is under active construction and although we have a test suite we do not guarantee that our code is bug-free.

Questions?

Please contact ilektra.schulz@unifr.ch

Updated