Clone wiki

TrioVis / Home

Welcome

TrioVis is a visualisation tool developed to assist filtering on coverage and variant frequency for genomic variants from exome sequencing of parent-child trios. It organises the variant data by grouping each variant based on the laws of Mendelian inheritance. Taking three Variant Call Format (VCF) files as input, the tool provides a user interface to test different coverage thresholds (i.e. different levels of stringency), to find the optimal threshold values, and to gain insights into the global effects of filtering.

Features

TrioVis consists of five sections: the main table, the global variant count bar graphs (top-right), the variant frequency sliders (bottom-left), the coverage sliders (bottom-centre) and the stacked bar graphs (bottom-right)(Fig 1). All the sections represent the same dataset, but focus on different aspects of the data and their interactions. Father, mother and child are colour-coded in green, orange, and blue, respectively.

The main table is divided into cells based on the pattern of inheritance. Each cell is a histogram based on the number of total reads per sample. The background colour of each cell is determined by whether it is consistent (white) or inconsistent (grey) with the laws of Mendelian inheritance. The global variant count bar graphs (top-right) show the total counts of variants based on whether or not it is consistent. By changing coverage settings, the user aims to minimise the number of inconsistent calls while keeping the number of consistent calls high.

On the bottom left, the \emph{variant frequency sliders} set the different ranges for a variant frequency used for genotyping variants for that sample. By default, any variants with variant frequency higher than 90 are considered alternative homozygous and any variants with variant frequency between 20 and 89 are considered alternative heterozygous. Any variants below 20 are filtered out. It also visualises the distribution of variants based on variant frequency values. Next to the variant frequency sliders, the coverage sliders set the coverage thresholds for each sample individually. They visualise the distribution of variants based on coverage values.

The histogram view shows the distribution of consistent and inconsistent variants in stacked bar graphs with coverage values between 1 and 20 for the selected sample. The view is updated dynamically to show the effect of adjusting coverage thresholds. With the "migration" assumption, any variant below the coverage threshold is considered homozygous reference; when this assumption is inactive, the variant below the coverage threshold is considered invalid and discarded from the combined set of variants. Hovering the mouse over the bar graph highlights cells in the main table showing where these variants are represented. Filtered results can be exported using the "export VCF" button and saved as VCF files. The user can also select specific cells in the main table by clicking and highlighting before executing the export function.

The optimisation function has been implemented and its demonstration video has been uploaded click here. It searches for the best f-score (the weighted average of the precision and recall) based on the consistent and inconsistent variant counts. Once the calculation is done, it sets to the optimal setting and the user can further calibrate setting based on the stringency of analysis.

Download

Sample Data

Sample Trio data were generated from the BAM files of CEU trios from the 1000 Genome Project. Because the .vcf files provided by the 1000 Genome Project did not include AD field, the GATK Unified Genotyper to produce calls only at variants sites was executed to produce these VCF files. Although the execution was not completed due to some runtime issues and these files contain only the variants identified for the chromosome 1 to 11, it can demonstrate how TrioVis works. As you will see running TrioVis, the NA12892 sample has lower coverage and less variant called, which consequently results very high number of variants inconsistent with the Mendelian laws of inheritance.

VCF File Preparation

As mentioned above, the VCF for TrioVis requires AD field. This can be obtained by running the GATK Unified Genotyper per sample. Further instruction can be found from their website GATK. The latest version at the time of writing this document was Version 2.2 -16.

Screen cast

For demonstration, a trio case of exome sequencing data from Illumina HiSeq2000 is used. This dataset is not provided, however a sample dataset was generated from publicly available data from 1000 Genome Project (see above).

FAQ / Known issues

  • If the application crashes or slows down while loading VCF files, it may have run out of heap space. In the case of running out of heap space, try running the application from the command line to expand the heap space. First go to the directory of where TrioVis.jar is saved, then execute the following from your command line tool: java -Xmx768m -jar TrioVis.jar

  • The file size, i.e. the coverage depth, of each file should be somewhat comparable. If the difference is very large, it may slow down the application.

  • Linux version and window manager: There was an issue of the file browser window opened behind the application screen. In the current version, the default full screen mode is turned off so that the user should be able to select files from the file browser. Please make sure that you have installed the current version of Java.

Contact

If you have any question, please send an email to ryo[dot]sakai[at]esat[dot]kuleuven[dot]be

Updated