Due to the increasing amount of sequence data current biological research focusses more and more on the exploration of the global gene repertoire of related species, referred to as the pangenome, instead of single genomic sequences.
PanCake provides analysis of pangenomes, namely the identification of genomic regions present in all strains of a given genome set (i.e. the core genome), or the identification of regions unique to a single strain (i.e. singleton regions).
Analysis is based exclusively on sequence data and pairwise alignments, which can be easily obtained from common alignment tools like nucmer (included in mummer utilities) or BLAST, and is independent of genome annotations.
First step in a typical PanCake workflow is initialization of a PanCake Data Object. This is done by providing sequence data in .fasta format and/or sequence ids. Once the PanCake Data Object is initialized it is stored in a specially formatted text file (by default denoted by suffix .pan). Based on sequence similarities and the pairwise alignment information included, this text file is supposed to be significantly smaller in size than raw sequence files.
At any time, on a PanCake Data Object you are able to
add further sequences to the data structure
cluster sequences into groups (i.e. genomes)
compute the core regions on all sequences or arbitrary sequence subsets
identify singleton regions
retrieve chromosome sequences
Graphical Output (using graphviz)
For the current version of PanCake are required:
working installation of Python >= 3.2
Numpy (Numerical Python)
To install, change into a folder of your choice and type
git clone https://bitbucket.org/CorinnaErnst/pancake
Then, change into folder pancake (via
cd pancake) and type
python setup.py install, respectively
python3 setup.py install. Which way applies depends on your system settings.
If you lack write permissions try
python setup.py install --user (or
python3 setup.py install --user).
Alternatively, you can install PanCake by issueing either
easy_install3 pancake or
easy_install pancake (depending on your system) in the command line. If you don't have administrator priviledges, have a look at the argument --user of easy_install.
Finally, PanCake can be manually installed by downloading the source code archive from pypi.
You can verify your installation by running
python setup.py test (or
python3 setup.py test).
This will also run a tiny test including download of 3 strains Corynebacterium diphteriae from the NCBI database, build a PanCake Object dependent on provided alignment file
tests/out.delta and serialize it into .pan file
The latter you may want to use as reference .pan file for inital trials on PanCake's utlities.