Wiki
Clone wikiATLAS / VCF Tools: VCFToLFMM
Overview
Convert a VCF to LFMM file. Various filters (MAF, depth, variant quality, missingness, specific samples, genomic regions, chromosomes etc.) can be set.
Input
- VCF file: to be converted
- geno: which LFMM format to be used. Either calledGeno, if called genotypes should be stored (input for LFMM1); or postGeno, if the mean posterior genotype should be stored (input for LFMM2). Please note: LFMM2 does not accept missing genotypes. Impute your vcf before using postGeno and do not set filters that lead to missing sites.
- txt file (optional): e.g. samplesPopulations.txt
This file is a user-created .txt file containing the samples to be used.
Example:
sample1 sample2 sample5 sample8 |
|
Output
- LFMM file with suffix ".lfmm". Contains the genotypes (parameter calledGeno) or the mean posterior genotypes (parameter postGeno) in LFMM format.
- text file with suffix ".lfmm.kept_loci". Contains the names (chr:pos) of loci that passed all filters and are present in the LFMM file.
Usage Example
./atlas task=VCFToLFMM geno=calledGeno vcf=example.vcf.gz samples=samplesPopulations.txt
Specific Arguments
- samples: specify samples to be used
- limitLines: amount of lines to be read from VCF file
- minDepth: only store sites with minimum depth. Default = 1
- minSamplesWithData: only store sites with minimum number of samples. Default = 1
- minMAF: only store sites where initial estimate of allele frequency is larger or equal to minMAF. Default = 0.0
- minVariantQuality: only store sites with minimum variant quality. Default = 0
- keepChromosomes: only loci on these chromosomes are kept. The argument can be a filename (which needs to end with .txt); or a comma-seperated list of chromosome names
- window: a BED-file with three columns that correspond to chromosome, start (0-based) and end position of windows that should be kept. If both keepChromosomes and window are defined, only the overlap of the two are kept
- reportFreq: after how many lines the reading progress is printed to the terminal. Default = 10000
- epsF: epsilon for EM algorithm to estimate allele frequencies. Default = 0.0001
Updated