Wiki
Clone wikisequencing / Applications_QC
Here we will describe 3 ways to obtain a full report on the initial quality of raw reads (fastq files) using QCFasta function included with Anduril. QCFasta not only creates a web report but also performs quality trimming of reads and discarding of low quality samples.
For all three cases we will assume that you have a CSV file with a column named Key (which has the ids of your samples) and a column named File (with the paths to the fastq file). If you have paired end reads you will need one csv file for the reads and one for the mates. Example fastq files can be found in the testcases of QCFasta (https://bitbucket.org/anduril-dev/sequencing/src/anduril2/functions/QCFasta/testcases/). The reads should be names "sample1_1.fq" etc. and the mates "sample1_2.fq" etc. or the suffices changed below.
QCFasta can use both compressed or uncompressed fastq files and you can choose if you want the output fastq files to be compressed as well.
So the most simple way to trim your files and create the report is the following:
#!/usr/bin/env anduril //$OPT --wrapper slurm-prefix //$OPT --pipe "tee $ANDURIL_EXECUTION_DIR/_log" //$OPT --pipe "anduril-pager --ls -t --title $ANDURIL_EXECUTION_DIR" //$OPT --threads 4 //$POST anduril-result-browser --unused "$ANDURIL_EXECUTION_DIR" import anduril.builtin._ import anduril.microarray._ import anduril.sequencing._ import anduril.tools._ import org.anduril.runtime._ object qcfasta { val reads = INPUT(path="files/reads.csv") val mates = INPUT(path="files/mates.csv") val allReads = CSV2Array(in = reads.out, keys="column") val allMates = CSV2Array(in = mates.out, keys="column") val qc = QCFasta( reads = allReads.out, mates= allMates.out, tool = "trimmomatic", headcrop = 10, slidingWindow = "5:20", adapterSeq = "TruSeq2-PE.fa", gzip = true, minQuality = 20, trailing = 20, percent = 70, threads = 6, mateKey = "_2", readKey = "_1") }
Once you have run this, you will get a report like (some file). If you have no idea how to set the parameters above and you will like to take a look at the quality before any trimming is done you can do the following:
object qcfasta { val reads = INPUT(path="files/reads.csv") val mates = INPUT(path="files/mates.csv") val allReads = CSV2Array(in = reads.out, keys="column") val allMates = CSV2Array(in = mates.out, keys="column") val folders = NamedMap[BinaryFile]("folders") for ((key,value) <- iterArray(allRead.out)) { withName(key){ val pre = FastQC(read = allReads.out(key), mate = allMates.out(key), clean = false, suffix = "preStats") folders[key] = pre.folder } } //Code will be inserted here later }
val qc = QCFasta( reads = allReads.out, mates = allMates.out, fastQCfolders = folders, tool = "trimmomatic", headcrop = 10, slidingWindow = "5:20", adapterSeq = "TruSeq2-PE.fa", gzip = true, minQuality = 20, trailing = 20, percent = 70, threads = 6, mateKey = "_2", readKey = "_1") }
Updated