Wiki

Clone wiki

Tiger / simulate

Overview

Simulate a vcf file according to a model specified in the task page infer.

Common parameters

  • sites: specify how many sites in the genome, default = 10000
  • samples: specify how many samples to simulate, default = 50
  • model: specify inference model, can be indRep, hardyWeinber, and truthSet

indRep

Parameters

  • replicates: specify how many replicates per sample to simulate. Default = 3
  • het: specify the fraction of heterozygous sites. Default = 0.1

Files created

  • simulations.vcf.gz: Genotype calls for all samples
  • simulations_sampleGroups.txt: The association of the samples to a replication group and error rate class. Error rates are estimated separately for each class. Samples in the same replication group are assumed to share the same genotypes.

Usage examples:

Different error rates for depth=1 and depth=2, only one individual that has 10 replicates:

./tiger task=simulate model=indReps replicates=10 het=0.01 samples=1 sites=30  error=0.1,0.2 outname=simple

Different error rates for depth=1 and depth=2, three replicate groups:

./tiger task=simulate model=indReps replicates=10 het=0.01 samples=3 sites=30  error=0.1,0.2 outname=multipleRepGroups

Different error rates for depth=1 and depth=2, three replicate groups and two error sets:

./tiger task=simulate model=indReps replicates=10 het=0.01 samples=3 sites=30 error=[0.1,0.2],[0.004,0.003] outname=multipleSets

For more examples see our Individual Replicates Tutorial

hardyWeinberg

Parameters

  • sites: specify how many sites in the genome, default = 10000
  • samples: specify how many samples to simulate, default = 50
  • populations: the number of simulated populations. Each population will have size = numSamples. Default = 3
  • alpha and beta: the simulated allele frequencies for all sites are sampled from a beta distribution defined by alpha and beta. Default = 0.5
  • numSitesPolymorphic: number of polymorphic sites
  • minMAF: minimum minor allele frequency at which a site is considered polymorphic
  • error: string of error rates, default = 0.5. You can simulate different error rate classes for sites of different depth
  • errorHet: string of error rates, needs to be same length as error. default = 0.5

Files created

  • simulations.vcf.gz: genotype calls for all samples
  • simulations_sampleGroups.txt: the association of the samples to a population and error rate class. Error rates are estimated separately for each class. Population parameters (alpha, beta, allele frequencies) are estimated separately for each population.
  • simulations_R_input.txt: the simulated observed genotype calls (what is in VCF) for all samples and loci encoded as 1 for homozygous reference, 2 for heterozygous and 3 for homozygous alternative allele
  • simulations_trueAlleleFrequencies.txt: the simulated true allele frequencies for all loci

Usage examples

Different error rates for depth=1 and depth=2, only one error class and one population:

./tiger task=simulate model=hardyWeinberg populations=1 samples=50 sites=30 alpha=0.5 beta=0.5  outname=test error=0.1,0.2

Different error rates for depth=1 and depth=2, two populations:

./tiger task=simulate model=hardyWeinberg populations=2 samples=50 sites=30 alpha=0.5 beta=0.5 outname=test

Different error rates for depth=1 and depth=2, two error classes and two population:

./tiger task=simulate model=hardyWeinberg populations=1 samples=50 sites=30 alpha=0.5 beta=0.5  outname=test error=[0.1,0.2],[0.004,0.003]

For more examples see our Hardy-Weinberg Tutorial

truthSet

One pair of observed and true samples is simulated for numSamples.

Parameters:

  • sites: specify how many sites in the genome, default = 10000
  • samples: specify how many samples to simulate, default = 50
  • het: specify the fraction of heterozygous sites. Default = 0.1
  • meanDepthOfTrue: mean depth of "true" sample
  • seqError: sequencing error. default = 0.01, corresponds to quality score 20

Files created:

  • simulations_samplePairs.txt: This names of the corresponding samples

Usage examples

./tiger task=simulate model=truthSet samples=10 sites=1000 error=0.1,0.2

For more examples see our Truth Set Tutorial

Updated