Wiki

Clone wiki

PanCake / Home

#Overview

Due to the increasing amount of sequence data current biological research focusses more and more on the exploration of the global gene repertoire of related species, referred to as the pangenome, instead of single genomic sequences. PanCake provides analysis of pangenomes, namely the identification of genomic regions present in all strains of a given genome set (i.e. the core genome), or the identification of regions unique to a single strain (i.e. singleton regions).
Analysis is based exclusively on sequence data and pairwise alignments, which can be easily obtained from common alignment tools like nucmer (included in mummer utilities) or BLAST, and is independent of genome annotations.

First step in a typical PanCake workflow is initialization of a PanCake Data Object. This is done by providing sequence data in .fasta format and/or sequence ids. Once the PanCake Data Object is initialized it is stored in a specially formatted text file (by default denoted by suffix .pan). Based on sequence similarities and the pairwise alignment information included, this text file is supposed to be significantly smaller in size than raw sequence files.

At any time, on a PanCake Data Object you are able to

  • add further sequences to the data structure

  • cluster sequences into groups (i.e. genomes)

  • include information from pairwise alignments computed by BLAST or nucmer (included in mummer utilities)

  • compute the core regions on all sequences or arbitrary sequence subsets

  • identify singleton regions

  • retrieve chromosome sequences

Graphical Output (using graphviz)

PanCake Graph

Installation

For the current version of PanCake are required:

  • working installation of Python >= 3.2

  • Numpy (Numerical Python)

  • BioPython

To install, change into a folder of your choice and type

#!text
git clone https://bitbucket.org/CorinnaErnst/pancake

Then, change into folder pancake (via cd pancake) and type python setup.py install, respectively python3 setup.py install. Which way applies depends on your system settings. If you lack write permissions try python setup.py install --user (or python3 setup.py install --user).

Alternatively, you can install PanCake by issueing either easy_install3 pancake or easy_install pancake (depending on your system) in the command line. If you don't have administrator priviledges, have a look at the argument --user of easy_install.

Finally, PanCake can be manually installed by downloading the source code archive from pypi.

You can verify your installation by running python setup.py test (or python3 setup.py test). This will also run a tiny test including download of 3 strains Corynebacterium diphteriae from the NCBI database, build a PanCake Object dependent on provided alignment file tests/out.delta and serialize it into .pan file test.pan. The latter you may want to use as reference .pan file for inital trials on PanCake's utlities.

#Documentation

Documentation

Example Workflow

Publication

Corinna Ernst and Sven Rahmann. "PanCake: A Data Structure for Pangenomes". Proceedings of the GCB 2013.

Updated