Wiki

Clone wiki

ATLAS / Engine Parameters

General parameters

  • bam : input BAM file
  • fasta : input FASTA reference file. This needs to be the reference used to create the BAM file.
  • out : prefix for output files. Default = BAM prefix
  • logFile : write status report to a file, the name of which is specified via this argument. May be used in conjunction with verbose and suppressWarnings.
  • silent : do not print status report on screen.
  • suppressWarnings : do not print warning messages.
  • fixedSeed : set the seed of the random generator.
  • addToSeed : this command is useful if you launch several jobs at the same time on a computer cluster and you do not want them to use the same seed. As a default, the random generator obtains its seed based on the time of day. With addToSeed you can add something to this seed based on the time, such as job-ID.

Input Filters

Default Behavior Switch off
keep reads from all read groups readGroup=readGroupToKeep1,readGroupToKeep2
ignore improper pairs keepImproperPairs
ignore unmapped reads keepUnmappedReads
ignore failed Quality Control (QC) keepFailedQC
ignore secondary alignments keepSecondary
ignore supplementary alignments keepSupplementary
ignore duplicates keepDuplicates
keep alignments with soft clipped bases filterSoftClips
keep forward and reverse alignments keepOnlyFwd / keepOnlyRev
keep first and second mates keepOnlyFirst / keepOnlySecond
do not filter based on fragment length set minFragmentLength and maxFragmentLength
do not filter based on mapping quality (MQ) set minMQ and maxMQ
ignore reads > insert size keepReadsLongerThanFragment

The parameter keepAllReads will set the filters such that all reads are kept.

Output filters

Quality scores

  • minOutQual: mimimum quality score printed. Any base that has a smaller quality will be set to 'N'. Default = 1
  • maxOutQual: maximum quality score printed. Any base that has a larger quality will be set to 'N'. Default = 93

Parameters available for tasks that parse reads and organize them by genomic windows

Many functionalities require the sequencing data to be organized into non-overlapping windows, which are made up of sites. Each site knows which bases are covering it. These are the parameters that can be set for these functionality:

  • chr : vector of chromosomes to be read. Example: 1,2,3
  • limitChr : read all chromosomes until the specified chromosome (this parameter is ignored if chr is used)
  • limitWindows : limit the reading of the BAM file to the first N windows on each chromosome
  • skipWindows: skip first N windows on each chromosome. Default = 1000000000
  • window : With this parameter you can: 1. specify the window size in [base pairs] if you want to go through the whole genome. Default window size = 1'000'000. 2. provide a BED file containing the coordinates of custom windows to be taken into account.
  • regions : specify positions/regions to be considered with a 0-based bed file (inverse of masking)
  • mask : input BED file listing the sites that should be masked. This can be provided as a compressed file, in which case the filename should contain ".gz", or decompressed.
  • maxMissing : specify the max percentage of sites with no sequencing depth in a window for the window to still be considered. Default = 1.0
  • maxRefN : specify the max percentage of sites with ref='N' in a window for the window to still be considered. Default = 1.0
  • minDepth : sites with a lower coverage than minDepth will not be considered. Default = 0
  • maxDepth : sites with a higher coverage than maxDepth will not be considered. Default = 1000000
  • minQual : called bases with a quality that is lower will not be taken into account. Default = 1
  • maxQual : called bases with a quality that is higher will not be taken into account. Default = 93
  • trim5 : bases with this distance from 5' end of read will be ignored. Default = 0.
  • trim3 : bases with this distance from 3' end of read will be ignored. Default = 0.

Specify post-mortem damage (PMD) parameters

ATLAS implements three different ways to specify PMD patterns:

  • none: no PMD at all, specified as none.
  • Empiric: This is simply a list of PMD rates as a function of position in the read and is specified as Empiric[0.2,0.3,...]. Positions beyond the length of the supplied vector are assumed to have the same PMD rate as the last entry.
  • Skoglund: This implements the exponential function proposed by Skoglund et al. 2014 specified by Skoglund[lambda,c] and corresponding to
\begin{equation*} P(pmd|pos) = \lambda *(1-\lambda)^{pos} + c \end{equation*}
  • Exponential: This implements a generalized exponential decay function specified as Exponential[a,b,c] and corresponding to
\begin{equation*} P(pmd|pos) = mu + (1-mu) *( a*e^{-b * pos} + c ) \end{equation*}

These pmd rates can be specified in three different ways:

  • pmd : Using this argument implies a single PMD definition for both C→T and G→A transitions from their respective ends (decay functions are the same from both 3'- and 5'-ends). Example: pmd=Empiric[0.3,0.2,0.1,0.05]

  • pmdCT and pmdGA: specify the PMD patterns independently for C→T and G→A transitions. Example: pmdCT=Exponential[0.1,0.1,0.05].

  • pmdFile : specify the post-mortem damage with an input file. This allows to specify PMD patterns individually for different read groups. The file must contain three columns: the name of the read group, followed by the C→T and G→A patterns. Example::

    ReadGroup1    Exponential[0.177221,0.37078,0.0999026]    none
    ReadGroup2    Empiric[0.4,0.2,0.1,0.05]    Exponential[0.196117,0.357616,0.101829]
    

Please see estimatePMD for more information on how to estimate PMD patterns with ATLAS.

Specify recalibration parameters

  • BQSRQuality : specify readgroup x quality recalibration table
  • BQSRPosition : specify readgroup x position recalibration table
  • BQSRPositionReverse : specify readgroup x reverse position recalibration table
  • BQSRContext : specify readgroup x context recalibration table
  • recal : specify recalibration parameters based on X-recalibration either with a filename or with [qQuality,qQualitySquared,...,qContext-T]. If you specify the parameters directly you can repeat the same value by using curly brackets {} to specify how many times it should be repeated. Example: recal=[1,0{24}]

Updated