Clone wiki

Pipit / Home

Welcome

Pipit is a gene-centric interactive visualisation tool designed to study structural genomic variations. Through focusing on individual genes as the functional unit, researchers are able to study and generate hypotheses on the biological impact of different structural variations, such as the deletion of dosage-sensitive genes or the formation of fusion genes. Pipit is a cross-platform Java application that visualises structural variation data from Genome Variation Format (GVF) files.

Features

Pipit is an interactive visualisation tool developed in Processing, an open source programming language and integrated development environment (IDE) based on Java. The tool is available for Linux, Mac OS X and Windows and has been tested under these operating systems. It takes a GVF file, a standard variation file format adopted by the Database of Genomic Variants archive (DGVa). GVF files are available from the DGVa website. Pipit uses gene track, cytoband and gene ontology information obtained from the UCSC table browser database. The current version supports the data from human (NCBI build 36 and 37) and mouse (NCBI build 37/mm9 and GRCm build 38/mm10).

Each affected gene is represented as a disk and filled according to which part of its structure is influenced by a structural variation (Figure~\ref{fig:01}). Structural variant types are colour coded and shown on the right panel. Unaffected genes are simplified as a line, connecting affected genes. The default promoter length upstream of the gene sequence can be set when loading the data.

There are four layouts to explore the structural variation data. The default view is the collapsed, ordered gene view. In this view, a coloured disk may represent an affected gene or consecutively ordered genes that are affected by the same type of structural variation. In the expanded view, all affected genes are individually visualised. The chromosome position view shows affected variants mapped to their genomic positions. Lastly, the unit plot view visualises affected genes by their type of influencing structural variant event.

When a disk unit is selected, the underlying genes and structural variation events are shown on the bottom panel, along with the chromosome with cytobands and transcripts with their exonic regions coloured in dark gray. The gene name shown in this panel links to the Ensembl browser and displays the genomic region. If Gene Ontology (GO) terms are associated with the selected gene, those terms are listed in the panel on the right. In this panel, the coloured square boxes for each structural variant types serve as radio buttons to hide or show a selected type of variant. The text field below searches for a specific gene amongst affected genes. GO terms associated with affected genes are listed, and conversely selecting a GO term highlights associated genes.

Download

Data File

GVF files

GVF files are available to download via FTP from the DGVa website.
The sample used to demonstrate is estd118_Keane_et_al_2011_MGSCv37-2011_10_19.129P2_OlaHsd.gvf, which is of the mouse model Build 37.
The human genome data mentioned in the supplementary material is estd180 Pang et al 2010.20120418.NCBI36.gvf.

##gff-version 3
##gvf-version 1.02
##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090
##genome-build NCBI MGSCv37
##assembly-name MGSCv37
##assembly-accession GCF_000001635.16
##file-date: 2011-10-19
# Study_accession: estd118
# Display_name: Keane_et_al_2011
# PMID:  21921910
# Study_description: http://www.sanger.ac.uk/resources/mouse/genomes/
# Paper_title: Mouse genomic variation and its effect on phenotypes and gene regulation.
# First_author: Keane
# Publication_year: 2011
# Curator_comments:  The data presented here are also relevant to Yalcin B et al., Sequence-based characterization of structural variation in the mouse genome Nature 2011 (7364):326-9.
# Curator_comments:  All variant calls have been mapped against the mouse reference assembly MGSCv37. The mouse taxonomic identifier is 10090. Variant calls have not been merged across the mouse strains.
# Curator_comments:  As the study has analysed related mouse species and sub-species, we have added additional information to the GVF files. The identifiers for the mouse strain, sample (also denoted by the strain name), and submitter variant are available in the GVF attribute column (ninth column). This column also includes the taxonomic identifier for each sub-species to indicate which mouse strain carries the called variant.
Chr1    DGVa    deletion    3021403 3021888 .   .   .   ID=essv2912700;Name=essv2912700;Parent=esv524768;var_origin=Not tested;Start_range=3021403,3021603;End_range=3021688,3021888;samples=129P2/OlaHsd;submitter_variant_id=ssv349;variant_description=ERV deletion;ncbi_tax_id=10090;strain_name=129P2/OlaHsd

Extra information starts with "#" or "##" and the current version requires the chromosome, source, variant type, start and end positions in a tab delimited form, as the first line of the sample file is shown above.

Sample CSV files

The sample CSV files can be downloaded from here. It includes the haploinsufficiency scores for the human (Huang, N., Lee, I., Marcotte, E. M., and Hurles, M. E. (2010). Characterising
and predicting haploinsufficiency in the human genome. PLoS genetics, 6(10),
e1001154
) and the known oncogenes for the mouse.

User defined model organism

If you would like to use load the data of other than human or mouse, you can prepare required files to run Pipit. Click here

DGva and dbVar

The European Bioinformatics Institute (EBI) and the National Center for Biotechnology Information (NCBI) maintain permanent public repositories, DGVa and dbVar, respectively.

Prepareing GVF file from your experiment

This page from dbVar provides examples describing popular experimental methods currently in use to identify structural variation. It describes how to process data from 3 types of experiment:

  1. Probe-based Methods (e.g., BAC array CGH, Oligo array aCGH, SNP arrays, etc)
  2. Mapping-based Methods (e.g., Paired-end mapping, Optical mapping, etc.)
  3. Sequencing-based Methods (e.g., Sanger sequencing, Next-gen sequencing, Sequence alignment, Read depth analysis, etc.)

Screencast

FAQ / Known issues

  • If the application crashes or slows down while loading a large GVF file, it may have run out of heap space. In the case of running out of heap space, try running the application from the command line to expand the heap space. First go to the directory of where Pipit.jar is saved, then execute the following from your command line tool:
 java -Xmx768m -jar Pipit.jar

Developer

Here is the developer page.

Contact

If you have any question, please send an email to ryo[dot]sakai[at]esat[dot]kuleuven[dot]be

Updated