Wiki

Clone wiki

PracticalHaplotypeGraph / UserInstructions / KmerImputationPipeline

Kmer Mapping and Imputation Pipeline

From all the k-mers generated by a set of genomes, K-mer mapping finds a subset of k-mers that map to single reference ranges and are present in only some of assembled genomes. In that way, all of the selected kmers are useful for discriminating between haplotypes. It also creates an index that maps each k-mer to the haplotypes in which it occurs. This k-mer index can be saved to a file for subsequent use for mapping short reads.

To use the k-mer index for read mapping, the k-mers generated from each read are mapped to haplotypes for the k-mers in the index. If all the k-mers from a read map to haplotypes from a single reference range, then that read is considered to map to those haplotypes. Briefly, a fastq file of reads is used to generate a count of reads mapping to each haplotype in the assemblies used to create the index. That information is used in turn for imputation.

The following steps will use the k-mer pipeline to impute haplotypes:

  1. Use the KmerHashMapFromGraphPlugin to create and store the k-mer index.
  2. Use the KmerReadMapperPlugin to map reads
  3. Use the BestHaplotypePathPlugin for imputation
  4. Use the PathsToVCFPlugin to export a VCF. Instructions for exporting a VCF are at the bottom of the linked imputation page.

Updated