Clone wiki

biobakery / metaphlan

MetaPhlAn Tutorial

MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.

MetaPhlAn is available as a Galaxy module and as a bitbucket repository. For additional information, please refer to the MetaPhlAn paper.

We provide support for MetaPhlAn users. Please join our Google group designated specifically for MetaPhlAn users. Feel free to post any questions on the google group by posting directly or emailing metaphlan-users@googlegroups.com



Overview

The following figure shows the workflow of MetaPhlAn.

https://bitbucket.org/repo/49y6o9/images/2448742986-overview_metaphlan.png

1. MetaPhlAn (Galaxy Module)

MetaPhlAn accepts the metagenomic shotgun sequencing data for metagenome profiling (input formats include: .fasta, .fastq, .tar.bz2 etc.)

Follow the instructions below to perform MetaPhlAn on a sample dataset using the Galaxy module.

  • Go to the Huttenhower's Galaxy server.
  • Click on the Get Data link on the left pane
    • Click Execute button to upload the file.
  • Click on the MetaPhlAn link on the left pane, and select the uploaded data set from the Input metagenome drop-down menu, and press Execute. (You may change the sensitivity settings according to your preferences)
  • Once completed, the icon representing the result (microbial abundance tables: first Column representing microbial species, second column representing microbial abundances) on the right pane will turn green. The data will be ready to download (by clicking on the save button on the right pane).

2. MetaPhlAn (bitbucket)

Please refer to the MetaPhlAn documentation for the pre-requisites/dependencies and installation instructions.

2.1 MetaPhlan: Input

MetaPhlAn accepts the metagenomic shotgun sequencing data for metagenome profiling (input formats include: .fasta, .fastq, .tar.bz2 etc.)

For the purpose of this tutorial we will use the following 8 samples as inputs (downloaded from the Human Microbiome Project website).

Input Samples: Buccal mucosal(SRS063417, SRS022158, SRS052620, SRS019379), Posterior Fornix(SRS016297, SRS014575, SRS019024, SRS058186)

  • Create an input directory under the metaphlan repository as: /metaphlan/input, and place your input files under it.

2.2 Running MetaPhlAn

  • Create a directory under metaphlan as: /metaphlan/profiled_samples to save the output results from MetaPhlAn into.

  • Run the following command from the terminal to save the list of sample names under a variable:

    $ samples ="SRS014575 SRS016297 SRS019024 SRS051868 SRS019379 SRS052620 SRS022158 SRS063417"
    
  • Run the following command to run metaphlan over all the input samples (this might take a while):

    $ for s in ${samples}
    > do
    > tar xjf input/${s}.tar.bz2 --to-stdout | ./metaphlan.py --bowtie2db bowtie2db/mpa --bt2_ps very-sensitive --input_type multifastq > profiled_samples/mp_${s}.txt
    > done
    
  • The saved output files will appear in the directory: /metaphlan/profiled_samples.

  • The output microbial abundance tables (tab-delimited) contain the microbial species (Column 1) and their associated relative abundances (Column 2) per sample.

metaphlan2_out.png

3. MetaPhlAn Visualization

3.1 Built-in Heatmap visualization (Bitbucket)

To visualize the MetaPhlAn results in the form of a heatmap, please follow the instructions below. The heatmap can be plotted for any, some or all of the microbial abundance table results. For the purpose of this tutorial we will plot the heatmap for all of the samples.

  • Create an output folder under /metaphlan/output/

  • Run the following command to merge all the microbial abundance tables in the profiled_samples directory:

    $ python utils/merge_metaphlan_tables.py profiled_samples/*.txt > output/merged_abundance_table.txt
    
  • Create an output_images directory as /metaphlan/output_images/ to store all the output images.

  • Run the following command to generate the heatmap:

    $ python plotting_scripts/metaphlan_hclust_heatmap.py -c bbcry --top 25 --minv 0.1 -s log --in output/merged_abundance_table.txt --out output_images/abundance_heatmap.png
    

The resulting heatmap is shown below:

abundance_heatmap.png

3.2 Using GraPhlAn

You may use GraPhlAn (Galaxy module (see Section 3.2.2 for instructions) or bitbucket repository (see Section 3.2.1) for instructions). For information on dependencies and installation for the GraPhlAn bitbucket repository please refer to the GraPhlAn tutorial.

GraPhlAn requires two inputs: (i) a tree structure to represent and (ii) graphical annotation options for the tree. MetaPhlAn includes the functionality to generate these files. Follow the instructions below to generate the GraPhlAn input files.

  • Create a temporary directory (e.g. /metaphlan/tmp) to store these files.

  • Run the following command from the terminal (current directory: metaphlan) to generate the two input files for GraPhlAn (Tree: merged_abundance.tree.txt, Annotation: merged_abundance.annot.txt):

    $ python plotting_scripts/metaphlan2graphlan.py output/merged_abundance_table.txt --tree_file tmp/merged_abundance.tree.txt --annot_file tmp/merged_abundance.annot.txt
    

Once generated, you can use these files to visualize the results using either the GraPhlAn bitbucket repository (Section 3.2.1) or the GraPhlAn Galaxy module (Section 3.2.2).

3.2.1 Using the GraPhlAn Bitbucket repository

To visualize using the GraPhlAn bitbucket repository, please ensure that the PATH environment variable is set to have access to the graphlan repository (for more information please see the documentation for GraPhlAn).

  • Run the following commands to (i) create a PhyloXML file from the two inputs (merged_abundance.tree.txt, merged_abundance.annot.txt), (ii) generate the cladogram:

    $ graphlan_annotate.py --annot tmp/merged_abundance.annot.txt tmp/merged_abundance.tree.txt tmp/merged_abundance.xml
    $ graphlan.py --dpi 200 tmp/merged_abundance.xml output_images/merged_abundance.png
    

The generated cladogram is shown below:

merged_abundance.png

3.2.2 Using the GraPhlAn Galaxy module

You can also use the GraPhlAn Galaxy module to visualize the results. Follow the instructions below.

  • Go to the Huttenhower Galaxy server
  • Click on the Upload File under Get Data link on the left pane.
    • Select the input tree file (merged_table.tree. for this tutorial)
    • Select the File Format as circl
    • Click on the Execute button to upload the Tree, as shown below:
graphlan_load.png
  • Click on the Annotate tree under GraPhlAn link on the left pane
    • From the Input Tree, select the tree you uploaded (merged_abundance.tree.txt in this tutorial).
    • From the Select Clade(s) list, select the clades you want to be displayed on the figure.
https://bitbucket.org/repo/49y6o9/images/634261406-graphlan_annotate_metaphlan_galaxy.png
    • In the text field Annotation Label, enter *
    • From the Annotation Label Clade Selector drop-down menu, select the Clade and its leaf nodes option
    • Click on the Execute button.
1391247145-Screenshot from 2014-09-12 11-27-22.png
  • Click on the Get Data link on the left pane, under the LOAD DATA MODULE, and upload the annotation file (merged_table.annot) as shown below:
upload_annotation_grpahlmetaphlan.png
  • Click on the Add rings to the tree link under the GraPhlAn module on the left pane.
    • Select the annotated tree (produced from the Annotate tree step) from the Input Tree drop-down menu, and from the Ring Input File drop-down menu, select the annotation file (merged_abundance.annot.txt) that you just uploaded, as shown below:
addrings_metaphlan_galaxy_graphlan.png
  • Click on the Plot Tree link from the left pane, and select the result (produced from the step above), and click on the Execute button, as shown below:
execute_graphlan_galaxy_metaphlan.png

The resulting image is going to be the same as shown above.


Notes

For further analysis, please refer to the tutorials for LEfSe and MaAsLin.

For more information on MetaPhlAn, please refer to the following wiki pages:

Updated