Clone wiki

metabit / Home


An integrative and automated metagenomic pipeline for analysing microbial profiles from high-throughput sequencing shotgun data

Wiki Content


The metaBIT pipeline proposes tools for visualising microbial profiles (barplots, heatmaps) and performing a range of statistical analyses (diversity indices, hierarchical clustering and principal coordinate analysis). It uses as input fastq files containing trimmed reads from shotgun high through-put sequencing (flowchart of metaBIT).


Louvel, G., Der Sarkissian, C., Hanghøj, K. and Orlando, L. (2016), metaBIT, an integrative and automated metagenomic pipeline for analysing microbial profiles from high-throughput sequencing shotgun data. Molecular Ecology Resources. doi: 10.1111/1755-0998.12546

What It Does

metaBIT is a metagenomic computational pipeline which identifies microbial taxa and their relative abundances from shotgun high-throughput DNA sequencing data using the program MetaPhlAn (Segata et al 2012).

With metaBIT, the user can visualise the resulting profiles through heatmaps and barplots, and compute summary statistics characteristic of each profile (e.g., diversity indices). The metaBIT pipeline supports comparison between several microbial profiles by computing inter-profile distances, and performing, e.g., hierarchical clustering, Principal Coordinates Analysis, and biomarker identification using LEfSe (Linear Discriminant Analysis Effect Size; Segata et al 2011).

What it does not do

The metaBIT pipeline does not perform adapter removal post-shotgun sequencing. DNA reads must be provided as fastq files after adapter trimming, using for example AdapterRemoval (Lindgreen 2012, as implemented in PALEOMIX (Schubert et al. 2014)).


metaBIT requirements

Installing R and python dependencies for metaBIT

R script (REF) to install required and uninstalled R dependencies for metaBIT


ipak <- function(pkg){
    new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
    if (length(new.pkg)) 
    sapply(pkg, require, character.only = TRUE)

packages <- c('optparse', 'ggplot2', 'reshape2', 'ape', 'vegan', 'survival', 'mvtnorm', 'modeltools', 'coin', 'MASS')

Bash commands to install required python (2.7) dependencies for metaBIT


$ pip install pysam numpy matplotlib rpy2 --user

metaBIT pipeline installation

  • Install all required dependencies as listed above.

  • Download metaBIT. You can use the command-lines below:

    $ git clone
  • You can optionally create a symlink to add metaBIT in your path (check echo $PATH). For example, if ~/bin is in your path:

    $ cd ~/bin
    $ ln -s -T path/to/metabit/ metaBIT

Configuring and testing the pipeline

In order to use MetaPhlAn, Picard Tools and LEfSe, their paths need to be provided as an option to the metaBIT command line:

$ metaBIT --metaphlan-path /path/to/metaphlan --lefse-path /path/to/lefse --jar-root /path/to/picard makefile.yaml

Or saved once for all in a configuration file written to ~/.pypeline/metabit.ini using the --write-config option:

$ metaBIT --metaphlan-path /path/to/metaphlan --lefse-path /path/to/lefse --jar-root /path/to/picard --write-config

In this latter case, the configuration file will be automatically parsed when metaBIT is executed, as shown below.

$ metaBIT makefile.yaml

Please note that the configuration file can also store other useful metaBIT options (see help menu for the option list). In particular if you are using a personal computer with little RAM, you should set the option --jre-option=-Xmx2g (choose appropriate value for -Xmx) to reduce the amount of memory used by Picard tools MarkDuplicates.

The path to the programs Bowtie2, samtools, "ktImportText" for Krona-tools should be added to the user's PATH (e.g. export PATH=$PATH:/path/to/KronaTools-2.5/bin).

Get help

$ metaBIT -h

Makefile in YAML-format

metaBIT requires one positional argument: a makefile in yaml format. See documentation makefile for a detailed description of the makefile parsed by metaBIT.

Test metaBIT with companion fast example

Your installation can be tested by running the example with data provided in the folder example, assuming all required paths are saved in the config file as show above:

$ cd example/
$ metaBIT fastexample.yaml

This should report no errors. Results will be saved in the current working directory named "out_yourmakefile", unless another --destination has been added.

Go to Tutorial for a thorough walk-through of metaBIT.

Results generated by metaBIT

Assuming the results have been saved in the directory "out_makefile", you will find the following folders:

  • one folder for each samples. It contains one folder per library.

  • each library folder contains intermediate files from the pipeline processing (example for single-end):

    1., output of Bowtie2.
    2. the sorted output of Bowtie2.
    3. the sorted output of Bowtie2, devoid of PCR duplicates.
    4. taxa.tsv: output from MetaPhlAn
  • in the main folder, a file named all_taxa.tsv is the merger of all the MetaPhlAn outputs (i.e. individual taxa.tsv files provided in each library folder).

  • a folder named krona containing every Krona input file and their corresponding results in a html file.

  • a folder named lefse containing results from LEfSe and all intermediate files

  • a folder named statax containing all statistical outputs (clustering, PCoA...) and plots (barplot, heatmap).

Virtual machine image

Virtual machine image ( for the metaBIT pipeline, including all required dependencies. Link to the most recent version (2. May 2016).

To run the virtual machine image file, VirtualBox and VirtualBox Extension Pack must be installed (