This task creates the input file for PSMC (pairwise sequentially Markovian coalescent). This program takes an input file with FASTA format, where the possible letters are T=Homozygous, K=Heterozygous, N=unknown. Each letter represents a window of 100bp of a genome, and we define each window with zero heterozygous sites as a "T", and windows with at least 1 heterozygous site as a "K". In order for ATLAS to be able to produce such a file, two further things need to be defined:a prior on theta, which is the heterozygosity you expect to see a priori, and a confidence threshold. For each window, ATLAS calculates the posterior probability of it being a "K" or a "T". If neither of these probabilities is higher than the confidence threshold, the window is defined as an "N".
- PSMC input file
./atlas task=PSMC bam=example.bam fasta=example.fasta pmdFile=example_pmd_input.txt recal=example_recalibrationEM.txt verbose
- theta : Prior for heterozygosity. Default = 0.001
- confidence : Confidence threshold for assigning a window with a "T" or a "K". Default = 0.99
Engine parameters that are common to all tasks can be found here.