Clone wiki

biobakery / phylophlan

PhyloPhlAn Tutorial

PhyloPhlAn is a computational pipeline for reconstructing highly accurate and resolved phylogenetic trees based on whole-genome sequence information. The pipeline is scalable to thousands of genomes and uses the most conserved 400 proteins for extracting the phylogenetic signal. PhyloPhlAn also implements taxonomic curation, estimation, and insertion operations.

For additional information, please refer to the manuscript: PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes. Nicola Segata, Daniela Börnigen, Xochitl C. Morgan, and Curtis Huttenhower. Nature Communications 4, 2013.

We provide support for PhyloPhlAn users. Please join our Google group designated specifically for PhyloPhlAn users. Feel free to post any questions on the google group by posting directly or emailing phylophlan-users@googlegroups.com




1. Install

PhyloPhlAn can be run from a Docker image. Please note, if you are using bioBakery (Vagrant VM) you do not need to install PhyloPhlAn because the tool and its dependencies are already installed (at $HOME/phylophlan/). However, you will need to install the dependency USEARCH (v5.2.32) which requires a license. Follow the commands in the instructions to install bioBakery dependencies that require licences.

Install with Docker: $ docker run -it biobakery/phyloplan bash

If you would like to install from source, refer to the PhyloPhlAn user manual for the pre-requisites/dependencies and installation instructions.

2. Phylogenetic tree building with any sets of genomes

PhyloPhlAn provides the functionality to build a phylogenetic tree using any set of private or public genomes. To build the tree using genome data follow the instructions below. For instructions on installation and dependencies, please refer to the PhyloPhlAn documentation.

2.1 Running PhyloPhlAn

  • Create a folder under the phylophlan/input directory.
  • Place all the genome files in the folder you created. For each genome file, also include a multifasta format file containing peptide sequences (extension ".faa").

For the purpose of this tutorial we will use example_corynebacteria as the folder containing the genome files.

  • Once the folder has been created, run the following command from the phylophlan directory:

    $ ./phylophlan.py -u example_corynebacteria
    
  • Once completed, please find the output under the output directory. Here you will find a folder with the same name as the folder you placed as input in the above command (in our case output/example_corynebacteria).

  • The folder will contain two files: (i) Newick tree file and (ii) PhyloXML file.

2.2 Visualization with GraPhlAn (Galaxy module)

You may use tree visualization softwares to inspect the files. We provide support for GraPhlAn to visualize the output tree. Please follow the instructions below to view the resulting tree.

https://bitbucket.org/repo/49y6o9/images/3266810747-Screenshot%20from%202017-09-01%2018-41-15.png
  • Click on the GraPhlAn -> Annotate tree link on the left pane.
    • Select the file you just uploaded from the Input File drop-down menu.
    • Select the clades you want to view from the Select clade(s) list.
    • Specify the clade edge or fill colors (optional).
    • Specify the Annotation Label in the text field (Example: * or *:*)
    • From the Annotation Label Clade Selector drop-down menu, select the level of taxonomy that you want annotated. (Example: Clade and its leaf nodes)
https://bitbucket.org/repo/49y6o9/images/534266384-Screenshot%20from%202017-09-01%2018-43-05.png
  • Click on the GraPhlAn -> Plot tree link from the left pane, select the annotated data as source, and press Execute.
https://bitbucket.org/repo/49y6o9/images/2933091345-Screenshot%20from%202017-09-01%2018-46-55.png

The resulting image is shown below:

https://bitbucket.org/repo/49y6o9/images/1255569268-Screenshot%20from%202017-09-01%2018-48-39.png

For more information on additional features and further manipualtions provided by GraPhlAn, please refer to the GraPhlAn documentation or GraPhlAn tutorial.


3. Inserting new genomes to the tree of life

To insert new genomes in the existing tree in the PhyloPhlAn repository, follow the instructions below.

  • Create a folder under the phylophlan/input directory.
  • Place all the genome files that need to be inserted (files need to be in the multifasta format; extension ".faa")

For the purpose of this tutorial we will use example_insertion as the folder containing the genome files.

  • Once the folder has been created, run the following command from the phylophlan directory:

    $ ./phylophlan.py -i example_insertion
    
  • Once completed, please find the output under the output directory. Here you will find a folder with the same name as the folder you placed as input in the above command (in our case output/example_insertion).

  • The folder will contain the resulting Newick tree file (example_insertion.tree.int.nwk) containing the tree in the PhyloPhlAn directory with the new genomes added. The file can be inspected using tree visualization softwares. We provide a GraPhlAn visual of the resulting tree below:

https://bitbucket.org/repo/49y6o9/images/1452281618-Screenshot%20from%202017-09-01%2018-49-10.png

For further details and additional options, please refer to the PhyloPhlAn documentation


Notes

For more information on PhyloPhlAn, please refer to the following wiki page:

Updated