Wiki

Clone wiki

ATLAS / Population Genetic Tools: alleleFreq

Overview

This task estimates the population allele frequencies from a multi-sample VCF file that per site has two alleles. Such a vcf file can be created with the ATLAS task major/minor. It uses different algorithms to estimate the genotype frequencies and alternative allele frequencies in a group of samples.

Input

  • VCF fil*e: created by e.g. ATLAS task major/minor
  • txt file (optional): e.g. samplesPopulations.txt

This file is a user-created .txt file containing the samples to be used and their population affiliation. Different allele counts will be estimated for different populations

Example:

sample1 1

sample2 1

sample5 2

sample8 2

Output

A zipped txt file with the following columns:

  • chromosome
  • position
  • ref: reference allele / major allele
  • alt: alternative allele / minor allele
  • numDiploid: number of diploid calls
  • numHaploid: number of haploid calls
  • freqAltHW: frequency of alternative allele, estimated with EM algorithm assuming Hardy-Weinberg
  • freqGenoRefRef: frequency of diploid homozygous reference genotypes, estimated with EM algorithm
  • freqGenoRefAlt: frequency of diploid heterozygous genotypes, estimated with EM algorithm
  • freqGenoAltAlt: frequency of diploid homozygous alternative genotypes, estimated with EM algorithm
  • freqGenoRef: frequency of haploid reference genotypes, estimated with EM algorithm
  • freqGenoAlt: frequency of haploid alternative genotypes, estimated with EM algorithm
  • freqAltGF: frequency of alternative allele, calculated based on genotype frequencies

if doBayesian is activated:

  • freqAltHWBayes: frequency of alternative allele, estimated with MCMC (prior on allele frequencies is a beta distribution)
  • freqAltHWBayes_CI0.05: 5% confidence interval for bayesian alternative allele frequency
  • freqAltHWBayes_CI0.95: 95% confidence interval for bayesian alternative allele frequency

Usage example

./atlas task=alleleFreq vcf=ATLAS_majorMinor.vcf.gz samples=samplesPopulations.txt

Specific Arguments

  • samples: specify samples to be used and their population affiliation
  • writeGenoFreq: also estimate and write genotype frequencies
  • limitLines: amount of lines to be read from VCF file
  • minDepth: only store sites with minimum depth
  • minSamplesWithData: only store sites with minimum number of samples. Default = 1
  • minMAF: only store sites where initial estimate of allele frequency is larger or equal to minMAF. Default = 0.0
  • minVariantQuality: only store sites with minimum variant quality
  • reportFreq: after how many lines the reading progress is printed to the terminal. Default = 10000.
  • epsF: epsilon for EM algorithm to estimate allele frequencies. Default = 0.0001
  • doBayesian: additionally perform a bayesian estimation of the allele frequencies with an MCMC
  • mcmcLength: number of iterations in MCMC. Default = 100000
  • numBurnins: number of burnins to be used in the MCMC of the bayesian estimation. Default = 3
  • burninLength: number of iterations per burnin. Default = 1000
  • alpha and beta: parameters of the beta distribution used as a prior for the allele frequencies. Default = 0.5

Allele age

The allele frequencies can be used to estimate the allelic age with this R script

Updated