This tasks estimates the pairwise distances per genomic window between at least two genomes in glf format. This format can be created with the atlas task glf.
In a first step, the frequencies of nine genotype configurations aa/aa, aa/ab, ab/aa, aa/bb, ab/ab, ab/ac, aa/bc, ab/cc and ab/cd are estimated. The genotype configuration aa/aa e.g. corresponds to a locus where both individuals are homozygous for the same allele and the configuration ab/cc corresponds to a locus where one individual is heterozygous and the other is homozygous for a different allele.
In a second step, these genotype configuration frequencies are multiplied by the user-specified distance weights and summed up to produce the genetic distance. Depending on the genetic distance you want to use you will give different weights to the genotype configurations.
Predefined distance weights:
- squaredDiff (default): This distance measure corresponds to the amount of alleles that differ between the genotypes. The distance weights in this case are: 0,1,1,4,0,1,4,4,4.
- euclidian: This distance measure corresponds to the square root of the squaredDiff. If this measure is used in a metric PCA, the MDS will be the same as a PCA.
- probMismatch: This distance measure corresponds to the probability that a random allele chosen at a random position differs between two individuals. The distance weights in this case are: 0,0.5,0.5,1,0.5,0.75,1,1,1.
- at least two glf files
A txt file for each pair of glf files. The columns correspond to the position of the window, the four base frequencies, the nine genotype configuration frequencies and the genetic distance.
./atlas task=geneticDist glf=example1.glf.gz,example2.glf.gz distType=probMismatch
- distWeights : A comma-separated vector of 9 weights, to be assigned to the genotype configurations in the following order: aa/aa,aa/ab,ab/aa,aa/bb,ab/ab,ab/ac,aa/bc,ab/cc,ab/cd. These weights represent how distant from each other you consider the two genotypes of each genotype configuration to be.
- distType : You can use predefined distance weights given by the distance types listed above. If no distWeights are given, it is assumed that the user wants to use the distance type squaredDiff.
- iterations: Change the maximum amount of EM iterations. Default = 100