Wiki

Clone wiki

Flink / Filtering Data

Filtering Data (task filter)

To filter the input data with Flink, specify task=filter. Moreover, you have to give on the command line the input file assigned to the parameter data, and you can choose the minor allele frequency fixing the parameter maf. You will get an output file called “filtered_data.txt” containing the sites with the allele frequency higher than the fixed minor allele frequency. The default value for the maf parameter is 0.0. You can also select a minimum number of haplotypes for a population with the parameter haploMin, to not consider sites having one of the population with the sampling less than this minimum. Its default value is 0.

Example:

./Flink task=filter data=Flink_simulations.txt maf=0.2 haploMin=5

Another way to filter the data is given by providing the integer countslimit. If the sum of the counts on all the populations for an allele is less or equal then the provided parameter countslimit, the allele will be removed. If the sum of the differences between the haplotype and the allele frequency is less or equal to countslimit, the allele will be removed.

Example:

./Flink task=filter data=Flink_simulations.txt countslimit=2

A third way to filter the data is given by the integer missedPop. If we have not data in at least "missedPop" populations, the allele will be removed.

In a fourth way you can provide the integer minVar. If an allele has not variation in more populations than minVar, the allele will be removed.

A fifth way is providing the integer minPop. If an allele is present in less or equal then "minPop" times, the allele will be removed.

Updated