Wiki

Clone wiki

Tiger / Tutorial Individual Replicates Model

Simple

Simulate

This command will simulate 10 samples which are all replicates of a single individual. All samples are thus assumed to have identical genotypes. It simulates 30 loci of which one percent is heterozygous. The error parameter specifies that we want to simulate an error rate of 0.1 for sites where depth=1 and an error rate of 0.2 where depth=2. It also means that we will simulate only loci with these depths. If you want to simulate more different depths, you need to provide more error values.

./tiger task=simulate model=indReps replicates=10 het=0.01 samples=1 sites=30  error=0.1,0.2 outname=simple

This command will produce files: simple.vcf.gz, which is the VCF of the simulated replicates, and simple_sampleGroups.txt, which contains the replicate group and error set associations of each sample (in this case all samples are simulated to belong to the same replicate group and to have been sequenced as one set).

Infer

This command will infer the probability distributions of all parameters, i.e. the error rates and genotype frequencies.

./tiger task=estimateIndReps vcf=simple.vcf.gz groups=simple_sampleGroups.txt

The genotype frequencies for all samples are in simple_individualGenotypeFrequencies.txt and in this case they are the same values for all samples. The estimated error rates for the two simulated depths are in file simple_errorRates.txt.

Adjust PL

This command will adjust the PL values in the VCF file with the error rate estimated from both homozygous and heterozygous sites.

./tiger task=adjustPL vcf=simple.vcf.gz errorRates=simple_errorRates.txt errorModel=1

This command produces the file simple_adjustedPL.vcf.gz, where the PL values have been corrected to reflect the genotyping error. This is the file that should be used in subsequent analyses.

Multiple replicate groups

Simulate

This command will simulate data in the same way as above except that it simulates a total of 30 samples, ten replicates for each of three individuals.

./tiger task=simulate model=indReps replicates=10 het=0.01 samples=3 sites=30  error=0.1,0.2 outname=multipleRepGroups

Infer

This command will infer the probability distributions of all parameters, i.e. error rates and genotype frequencies. Separate genotype frequencies will be estimated for the three replicate groups. Note the additional parameter "groupCol", as compared to the simple example. This parameter tells TIGER where to find which column in the "groups" file corresponds to the replicate group association.

./tiger task=estimateIndReps vcf=multipleRepGroups.vcf.gz groups=multipleRepGroups_sampleGroups.txt groupCol=2

Adjust PL

This command will adjust the PL values of the homozygous individuals in the VCF file with the error rate estimated from the homozygous sites, and the equivalent for the heterozygous sites.

./tiger task=adjustPL vcf=multipleRepGroups.vcf.gz errorRates=multipleRepGroups_errorRates.txt errorModel=2

This command produces the file simple_adjustedPL.vcf.gz, where the PL values have been corrected to reflect the genotyping error. This is the file that should be used in subsequent analyses.

Multiple error sets

Simulate

This command will simulate different error sets, where half of the samples will be simulated with error rate=0.1 for depth=1 and error rate=0.2 for depth=2, and the other half of the samples will be simulated with error rate=0.004 for depth=1 and error rate=0.003 for depth=2. We refer to these two groups of samples as "sets".

./tiger task=simulate model=indReps replicates=10 het=0.01 samples=3 sites=30 error=[0.1,0.2],[0.004,0.003] outname=multipleSets

Infer

This command will infer the probability distributions of all parameters, i.e. error rates and genotype frequencies. Separate error rates will be estimated for the different sets. Note the additional parameters "batches" and "batchesCol", as compared to the example above. These parameters tell TIGER where to find the set that each sample is associated to, namely in the file provided with "batches" and in the column provided by "batchesCol".

./tiger task=estimateIndReps vcf=multipleSets.vcf.gz groups=multipleSets_sampleGroups.txt groupCol=2 batches=multipleSets_sampleGroups.txt batchesCol=3

Adjust PL

This command will adjust the PL values of the homozygous individuals in the VCF file with the error rate estimated from the homozygous sites, and the equivalent for the heterozygous sites.

./tiger task=adjustPL vcf=multipleSets.vcf.gz errorRates=multipleSets_errorRates.txt errorModel=2 batches=multipleSets_sampleGroups.txt batchesCol=3

This command produces the file simple_adjustedPL.vcf.gz, where the PL values have been corrected to reflect the genotyping error. This is the file that should be used in subsequent analyses.

Updated