Wiki

Clone wiki

Flink / Simulation

Simulation (task simulate)

To simulate allele frequencies use the task “simulate” followed by the values of the parameters, the number of populations for groups (pops) which are divided by a comma, the number of sites (numSites), the distance between two adjacent sites (distBetweenSites) and the number of the sample of haplotypes (N).

One group

To simulate data for only one group, you have just to fix the parameter that are involved to get the selection coefficient and the drift parameter for a group, thus alpha_max, beta, lnkappa, pops, numSites, distBetweenSites, and N.

example:

./Flink task=simulate beta=-1.0 alpha_max=1.0 lnkappa=-2.0 pops=3 numSites=1000 distBetweenSites=1 N=10000

You will get an output file that is called Flink_simulations.txt containing the simulated allele frequencies, an output file for each different groups (S”NumberOfTheGroup”_simulated.txt) and an output file for the ancestral allele frequencies.

Using the option “data”, it is also possible to use an input file to fix the name of the group and of the populations, the number of loci and the distances between them.

example of launching the program using an input file:

./Flink task=simulate data=inputfile beta=-1.0 alpha_max=1.0 lnkappa=-2.0

To see how an input file has to look like, see the paragraph “Input file” in the chapter Launching Flink.

Higher hierarchy

To simulate allele frequencies for more than one group, you have also to fix the parameters to get an higher hierarchy simulation, that means you have to fix B, A_max and lnK.

example:

./Flink task=simulate B=-1.0 A_max=1.0 lnK=-2.0 beta=-1.0 alpha_max=1.0 lnkappa=-2.0 pops=3,3,3 numSites=1000 distBetweenSites=1 N=10000

In addition to the files generating in the one group simulation, you will get an extra output file containing the simulated S for the world hierarchy (S_simulated.txt).

Output files

There are several output files that are generated from the simulation. The model of the called file “Flink_simulations.txt” is identical to an example of the input file (see Launching Flink). In the file “freq_p.txt” instead, there are printed the ancestral allele frequencies generated during the simulation. In the first column there are the names of the frequencies, with the first index concerning the population, and the second the site index. The second column shows the value of the frequencies. There are also some output files giving the S values used in the simulation, they are called "S"groupnumber"_simulated.txt" for each group, and "S_simulated.txt" for the higher hierarchy. You can use these files as input for the parameter estimation using the tasks Sg_fromfile and S_fromfile.

Other arguments

You can also specify other possible arguments.

Arguments            Default value         Explanation
s_max                      2               Maximum state of the Markov model
lnMu                    -2.0               Probability involved in the generating matrix to go to a different state for the higher hierarchy.
lnNu                    -1.0               Probability involved in the generating matrix to go to a state of selection from the neutral state for the higher hierarchy.
lnMu_g                  -2.0               Probability involved in the generating matrix to go to a different state.
lnNu_g                  -1.0               Probability involved in the generating matrix to go to a state of selection from the neutral state.
log_a                    0.0               Describes the shape of allele frequencies in the ancestral population, assuming a beta distribution (the peak around 1.0).
log_b                    0.0               Describes the shape of allele frequencies in the ancestral population, assuming a beta distribution (the peak around 0.0).
maxDist              1000000               Maximum distance to group the sites of a chromosome for the linkage
numChr                     1               Number of chromosomes to simulate
numSites                 100               Number of sites to simulate
distBetweenSites        1000               Distance between sites
N                        100               Number of haplotypes in a sample
lnkappa                 -2.0               Logarithm of group positive scaling parameter
lnK                     -2.0               Logarithm of world positive scaling parameter
P                                          Fix all the world ancestral allele frequencies to a value between (0, 1)
p                                          Fix all the ancestral allele frequencies to a value between (0, 1)
S                                          Fix all the ancestral states S to a minimum value (min), to a maximum value (max), to the neutral state (neutral) or to an integer value between [-s_max, s_max]. Fixing this argument, the parameter lnK will not be used
Sg                                         Fix all the group states Sg to a minimum value (min), to a maximum value (max), to the neutral state (neutral) or to an integer value between [-s_max, s_max]. Fixing this argument, the parameter lnkappa will not be used

Updated