Wiki
Clone wikiATLAS-Pipeline / Gaia
Genome Wide Alignment Including Adapter-trimming
This part of the workflow handles the raw data analysis from unaligned FASTQ files to aligned BAM files.
It includes first quality checks, adapter trimming, alignment, marking of duplicates and some prior filtering.
Before running the pipeline:
- Create a config file. An example can be found at example_files/example_config_Gaia.yaml
- Create a samples file
Configfile
Provide for each project an individual configfile in yaml format. This file can be shared with other researchers to perform the exact same analysis independently.
This is a template to an example configfile for Gaia:
runScript: Gaia # 1. samples file sample_file: samples_Gaia.tsv # 2. programs, references, etc. atlas: /path/to/your/atlas/executable/atlas/atlas ref: /path/to/your/reference/file/reference.fa # 3. how was your bamfile sequenced? -- uncomment only ONE option for your analysis #sequence: single sequence: paired # 4. Thresholds mappingqual: 30 # 5. additional inputs CN: Test #sequencing location for header-information # 6. does the raw-data contain adapters? Select T/F. if TRUE, adapter-trimming will be perwormed. If FALSE the fastq-files will be aligned without trimming. Adapter: T # 7. if adapters should be removed, TrimGalore will run with default parameters, including the removal of standard illumina adapters. # here you can specify different adapter sequences and/or parameters: AdapterSequence1: default AdapterSequence2: default lengthFilter: 30 qualityFilter: 0 #8 Java memory - specify the memory allocated for picard-tools MarkDuplicates Xmx: -Xmx120G #9 how many threads to use when multi-thread is possible/advised? threads: 10
Samples file
The samples file should contain a tab separated table with the following columns:
- Sample - The prefix you want to give your sample in the end
- Lib - Duplicates are being marked among all files of the same sample that share the same Lib identifyer.
- File - The prefix of each of your input files. No restriction on characters or signs.
Suffix must be according to Illumina standard.
For paired-end data: only enter one line and specify sequencing mode in your config file. R1 and R2 files must have the same prefix. - Path - Either complete or relative path to each sample. No specific folder structure needed.
Example:
/path/to/sample1/file1_R1_001.fastq.gz
/path/to/sample1/file2_R1_001.fastq.gz
../relative/path/sample2/file1_R1_001.fastq.gz
../relative/path/sample2/file2_R1_001.fastq.gz
/additional/path/sample2/file3_R1_001.fastq.gz
Sample | Lib | File | Path |
---|---|---|---|
Sample1 | LibA | file1 | /path/to/sample1/ |
Sample1 | LibB | file2 | /path/to/sample1/ |
Sample2 | LibA | file1 | ../relative/path/sample2/ |
Sample2 | LibA | file2 | ../relative/path/sample2/ |
Sample2 | LibB | file3 | /additional/path/sample2/ |
Results:
The final aligned and filtered bamfiles can be found in Results/1.FASTQ/10.MkDup_per_sample/
Also have a look at your fastQC results in Results/1.FASTQ/02.fastqc/. You can open the *html files with any internet-browser. Check for adapter contamination or any other potential quality-problems. For details, refer to the fastQC manual.
Updated