1. Computational Metagenomics Lab
  2. Untitled project
  3. MetaMLST

Wiki

Clone wiki

MetaMLST / Home

MetaMLST

Multilocus Sequence Typing from metagenomic data

MetaMLST is a computational tool for strain level identification from metagenomic data. It exploits the Multi Locus Sequence Typing approach and performs and in-silico reconstruction of the MLST-specific loci.

Requirements


To use MetaMLST you will need the following packages and tools:

  • Bowtie2 >= v. 2.2.6
  • Samtools >= 1.3.1 (or >= 0.1.19 in legacy mode)
  • Biopython >= 1.63
  • Pysam >= 0.11.1 (recommended, but not essential)
  • Python >= 2.7

Input & Output


MetaMLST takes as input Shotgun Metagenomic NGS Reads in FASTQ format (e.g. Illumina Hi-Seq. MetaMLST only works with shotgun sequencing data, and it is not applicable to 16S rRNA sequencing datasets.

MetaMLST outputs (by default in ./out):

  • a visual list of detected MLST-trackable microbial species.
  • A tab-separated file containing the typings of each sample provided. One file for each species.
  • A tab-separated file containing the updated typing table (i.e. known and newly identified Sequence Types). One file for each species.
  • A FASTA or CSV file containing the sequences of the MLST-reconstructed loci for each sample. One file for each species. (see the --outseqformat option)

How it works


The MetaMLST consists of four phases:

  1. Retrieval of the available MLST data and creation of the MetaMLST-db ▸ metamlst-index.py;
    • This step can be skipped if you use the pre-made database (metamlstDB_2017.db)
  2. Mapping of the metagenomic reads against the retrieved reference sequences ▸ (Bowtie2);
  3. Detection of microbial targets and reconstruction of the sample-specific MLST loci ▸ metamlst.py;
  4. ST calling and downstream comparative analysis ▸ metamlst-merge.py

Schema

Specific Documentation


How do I make it work?


▸ Quick Start

Step 0: Make sure you have all the requirements installed and available.

Step 1: Clone the repository (or download and extract the full package from https://bitbucket.org/CibioCM/metamlst/downloads):

hg clone https://bitbucket.org/CibioCM/metamlst
cd metamlst

Step 2: Create a Bowtie2 index from the default MetaMLST database.

metamlst-index.py -d metamlstDB_2017.db -i bowtie_index

Step 3: Use the index to map your FASTQ file(s):

bowtie2 --very-sensitive-local -a --no-unal -x bowtie_index -U YOUR_READS.FASTQ | samtools view -bS - > YOUR_ALIGNMENTS.bam

Step 4: Run MetaMLST on the BAM file. The results will be saved in ./out:

metamlst.py -d metamlstDB_2017.db YOUR_ALIGNMENTS.bam

Note: if you are using samtools 0.x, add the --legacy_samtools parameter

Step 4-B: Repeat Step 3-4 for each sample of interest

Step 5: Run MetaMLST-merge on the the metamlst.py output files. The results will be saved in ./out/merged:

metamlst-merge.py -d metamlstDB_2017.db ./out

▸ Example 1: Type S. epidermidis in a single sample

Run the following test script: ./examples/1_single_sample/test.sh.

The sample FASTQ file SRS013261_epidermidis.fastq contains a subsets of the HMP sample SRS013261. The script executes the following commands:

#Generate a Bowtie2 index from the pre-made database
../../metamlst-index.py -i bowtie_MmetaMLST ../../metamlstDB_2017.db

#Map the fastq with Bowtie
bowtie2 --threads 4 --very-sensitive-local -a --no-unal -x bowtie_MmetaMLST -U SRS013261_epidermidis.fastq | samtools view -bS - > SRS013261_epidermidis.bam;

#Run MetaMLST on a single sample
../../metamlst.py -d ../../metamlstDB_2017.db SRS013261_epidermidis.bam -o ./out/

#Type the STs
../../metamlst-merge.py -d ../../metamlstDB_2017.db ./out/

This is a list of the output files produced by MetaMLST at the end of the test-script:

- File (in /examples/single_sample) Type Description
1 sepidermidis.db MetaMLST Database Contains STs and sequences
2 ./out/merged/sepidermidis_report.txt MetaMLST Report File Contains the aggregate analysis for all the samples, regarding S. epidermidis. This file contains
3 ./out/merged/sepidermidis_ST.txt MetaMLST ST File Contains the new S. epidermidis ST table after the analys (all the known profiles plus the new profiles detected in the samples).

▸ Example 2: Type S. epidermidis in a single sample with a custom database

Run the following test script: ./examples/2_single_sample_custom_db/test.sh.

As for example 1, SRS013261_epidermidis.fastq contains a subsets of the HMP sample SRS013261. The script executes the following commands:

#Create Database with the sequences from MLST_sepidermidis.fasta"
../../metamlst-index.py -s MLST_sepidermidis.fasta sepidermidis.db 

#Create Database with the typings from MLST_sepidermidis_types.txt"
../../metamlst-index.py -t MLST_sepidermidis_types.txt sepidermidis.db

#Generate a Bowtie2 index
../../metamlst-index.py -i bowtie_sepidermidis sepidermidis.db

#Map the fastq with Bowtie
bowtie2 --threads 4 --very-sensitive-local -a --no-unal -x bowtie_sepidermidis -U SRS013261_epidermidis.fastq | samtools view -bS - > SRS013261_epidermidis.bam;

#Run MetaMLST on a single sample
../../metamlst.py -d sepidermidis.db SRS013261_epidermidis.bam -o ./out/

#Type the STs
../../metamlst-merge.py -d sepidermidis.db ./out

The files produced at the end of the execution are the same of Example 1

▸ Example 3: Type S. epidermidis and P. acnes in the same sample

Run the following test script: ./examples/3_single_sample_multiple_species/test.sh.

MetaMLST is executed on a single file, with the pre-made database, idenifying S. epidermidis and P. acnes. The script executes the following commands:

#Generate a Bowtie2 index
../../metamlst-index.py -i bowtie_sepidermidis ../../metamlstDB_2017.db

#Map the fastq with Bowtie
bowtie2 --threads 4 --very-sensitive-local -a --no-unal -x bowtie_sepidermidis -U SRS013261_epidermidis.fastq | samtools view -bS - > SRS013261_epidermidis.bam;

#Run MetaMLST on a single sample
../../metamlst.py -d ../../metamlstDB_2017.db SRS013261_epidermidis.bam -o ./out/

#Type the STs
../../metamlst-merge.py -d ../../metamlstDB_2017.db ./out/

▸ Example 4: Type S. epidermidis and P. acnes in multiple samples (+ metadata)

metamlst.py can be run on multiple samples before the MetaMLST-merge step, and can add external metadata to the report files:

Run the following test script: ./examples/4_two_samples_with_metadata/test.sh.

#Generate a Bowtie2 index
../../metamlst-index.py -i bowtie_MmetaMLST ../../metamlstDB_2017.db

#Map the fastq with Bowtie for each sample
bowtie2 --threads 4 --very-sensitive-local -a --no-unal -x bowtie_MmetaMLST -U SRS015937_epidermidis.fastq | samtools view -bS - > SRS015937_epidermidis.bam;
bowtie2 --threads 4 --very-sensitive-local -a --no-unal -x bowtie_MmetaMLST -U SRS013261_epidermidis.fastq | samtools view -bS - > SRS013261_epidermidis.bam;

#Run MetaMLST on a each sample
../../metamlst.py -d ../../metamlstDB_2017.db SRS015937_epidermidis.bam -o ./out/
../../metamlst.py -d ../../metamlstDB_2017.db SRS013261_epidermidis.bam -o ./out/

#Type the STs using the metadata:
../../metamlst-merge.py -d ../../metamlstDB_2017.db --meta test_metadata.txt ./out/

The script will pair the metadata of the given file with the report-file generated for each species in ./out/merged. The metadata file is a tab-separated table where each row is a sample and each column is a metadata field. The first row is a header. The sampleID (i.e. the name of the file, without extension) must be specified in the first column. A different column can be used, providing the --idField option to metamlst-merge.py. See the related page: metaMLST-merge

▸ Example 5: Type S. epidermidis and P. acnes in multiple samples (+ metadata)

MetaMLST output files can be used to generate Phylogenetic Trees based on the reconstructed MLST loci, as well as Minimum Spanning Trees (using tools like PHYLOViZ). In this example, 27 metagenomic files from the HMP (only P. acnes aligning reads are provided) are analyzed with MetaMLST.

Run the following test script: ./examples/5_phylogenetic_analysis/test.sh. The script executes the steps of Example 1 on 27 samples, and then executes:

#Type the STs
metamlst-merge.py -d ../../metamlstDB_2017.db --meta sample_metadata.txt --outseqformat A ./out

Minimum Spanning Trees are a common way to analyse MLST data. Using the typing table (./out/merged/pacnes_ST.txt) and the report file (./out/merged/pacnes_report.txt) you can generate a Minimum Spanning Tree with PHYLOViZ) (the report file can be used as an isolate file to colour the graph according to the metadata):

MST generated with PHYLOViZ

A Minimum Spanning Tree generated from the 27 genomes coloured by metadata field "Metadata_Field_1) plus the available Reference STs for P. acnes (brown)

Using the --outseqformat A option in MetaMLST-merge, you can generate an additional file: ./out/merged/pacnes_sequences.fna, containing the aligned and concatenated sequences of each locus of the 27 samples analysed. By default (see metamlst-merge.py) there is one entry for each sample.

This file can be supplied directly to any phylogenetic-tree software such as RAxML and the tree can be viewd with Archaeopteryx:

mkdir ~/5_phylogenetic_analysis_trees/
raxmlHPC-PTHREADS-SSE3 -T 4 -m GTRCAT -s ./out/merged/pacnes_sequences.fna -w ~/5_phylogenetic_analysis_trees/ -n pacnes_trees -p 12345;

Tree generated with RAxML

A Phylogenetic Tree built with RAxML on the concatenated MLST loci of the 27 samples analysed in this example

▸ Example 6: Re-use the typing table and sequences in future analyses

MetaMLST allows to update the database with newly detected sequnces and typing. This can be useful in case of re-detection of a new ST while analysing a different sample, or to cross-compare different dataset analysed in different times.

Run the following test script: ./examples/6_reuse_the_db/test.sh. The script executes the steps of Example 4 on the SRS015937 sample, that harbors a new ST of S. epidermidis. The script then updates the database with:

  • The updated sequences (./out/merged/sepidermidis_sequences.fna) (generated with the --outseqformat B of MetaMLST-merge)
  • The updated ST table (./out/merged/sepidermidis_ST.txt), modified to include "#sepidermidis|Staphylococcus epidermidis" as first line.

The pipeline is then re-run on the same sample, but with the new Database. The ST-100001 is then considered as "Known" (i.e. previously detected)

See the test script for furhter details.

Useful Resources


Public MLST sources:

Visualization Tools

Reference



MetaMLST is a project of the Computational Metagenomics Lab at CIBIO, University of Trento, Italy.

Updated