Wiki
Clone wikiTiger / simulate
Overview
Simulate a vcf file according to a model specified in the task page infer.
Common parameters
- sites: specify how many sites in the genome, default = 10000
- samples: specify how many samples to simulate, default = 50
- model: specify inference model, can be indRep, hardyWeinber, and truthSet
indRep
Parameters
- replicates: specify how many replicates per sample to simulate. Default = 3
- het: specify the fraction of heterozygous sites. Default = 0.1
Files created
- simulations.vcf.gz: Genotype calls for all samples
- simulations_sampleGroups.txt: The association of the samples to a replication group and error rate class. Error rates are estimated separately for each class. Samples in the same replication group are assumed to share the same genotypes.
Usage examples:
Different error rates for depth=1 and depth=2, only one individual that has 10 replicates:
./tiger task=simulate model=indReps replicates=10 het=0.01 samples=1 sites=30 error=0.1,0.2 outname=simple
Different error rates for depth=1 and depth=2, three replicate groups:
./tiger task=simulate model=indReps replicates=10 het=0.01 samples=3 sites=30 error=0.1,0.2 outname=multipleRepGroups
Different error rates for depth=1 and depth=2, three replicate groups and two error sets:
./tiger task=simulate model=indReps replicates=10 het=0.01 samples=3 sites=30 error=[0.1,0.2],[0.004,0.003] outname=multipleSets
For more examples see our Individual Replicates Tutorial
hardyWeinberg
Parameters
- sites: specify how many sites in the genome, default = 10000
- samples: specify how many samples to simulate, default = 50
- populations: the number of simulated populations. Each population will have size = numSamples. Default = 3
- alpha and beta: the simulated allele frequencies for all sites are sampled from a beta distribution defined by alpha and beta. Default = 0.5
- numSitesPolymorphic: number of polymorphic sites
- minMAF: minimum minor allele frequency at which a site is considered polymorphic
- error: string of error rates, default = 0.5. You can simulate different error rate classes for sites of different depth
- errorHet: string of error rates, needs to be same length as error. default = 0.5
Files created
- simulations.vcf.gz: genotype calls for all samples
- simulations_sampleGroups.txt: the association of the samples to a population and error rate class. Error rates are estimated separately for each class. Population parameters (alpha, beta, allele frequencies) are estimated separately for each population.
- simulations_R_input.txt: the simulated observed genotype calls (what is in VCF) for all samples and loci encoded as 1 for homozygous reference, 2 for heterozygous and 3 for homozygous alternative allele
- simulations_trueAlleleFrequencies.txt: the simulated true allele frequencies for all loci
Usage examples
Different error rates for depth=1 and depth=2, only one error class and one population:
./tiger task=simulate model=hardyWeinberg populations=1 samples=50 sites=30 alpha=0.5 beta=0.5 outname=test error=0.1,0.2
Different error rates for depth=1 and depth=2, two populations:
./tiger task=simulate model=hardyWeinberg populations=2 samples=50 sites=30 alpha=0.5 beta=0.5 outname=test
Different error rates for depth=1 and depth=2, two error classes and two population:
./tiger task=simulate model=hardyWeinberg populations=1 samples=50 sites=30 alpha=0.5 beta=0.5 outname=test error=[0.1,0.2],[0.004,0.003]
For more examples see our Hardy-Weinberg Tutorial
truthSet
One pair of observed and true samples is simulated for numSamples.
Parameters:
- sites: specify how many sites in the genome, default = 10000
- samples: specify how many samples to simulate, default = 50
- het: specify the fraction of heterozygous sites. Default = 0.1
- meanDepthOfTrue: mean depth of "true" sample
- seqError: sequencing error. default = 0.01, corresponds to quality score 20
Files created:
- simulations_samplePairs.txt: This names of the corresponding samples
Usage examples
./tiger task=simulate model=truthSet samples=10 sites=1000 error=0.1,0.2
For more examples see our Truth Set Tutorial
Updated