Wiki
Clone wikiMetaBAT / Benchmark_MetaHIT_2.5kb
Benchmark of Automated Metagenome Binning Software in Complex Metagenomes II
For an alternative benchmark of metagenome binning software, we used the same MetaHIT human gut metagenome data (Accession #: ERP000108) as the controlled benchmark. We took CONCOCT, GroopM, and MaxBin as the alternative software to compare with MetaBAT. We adopted single copy core genes as the gold standard to test recall (completeness) and precision (inverse of contamination); for the goal, we used CheckM.
Some of pitfalls of this benchmark are as follows:
-
Single copy core genes are one of the ways to measure the completeness of a bin and the degree of contamination, but they are not perfect. It often over- or under- estimates the truth due to many reasons including genome diversity and poor assembly quality.
-
Some methods in this benchmark already incorporated single copy gene information, which may bias the evaluation using the same information.
Summary of the benchmark results
Using 2.5kb contig size cutoff (60,619 contigs)
MetaBAT* | CONCOCT | GroopM^^ | MaxBin | |
---|---|---|---|---|
Number of Bins Identified (>200kb) | 172 | 195 | 257 | 122 |
Number of Quality Bins (Precision > .9 & Recall > .5) | 58 | 36 | 20 | 10 |
Wall Time (16 cores; 32 hyper-threads) | 01:04:21 | 30:15:49 | 4:33:20 | 03:23:46 |
Peak Memory Usage (for binning step) | 2.8G | 5.8G | 3.4G | 4.8G |
Using 1.5kb contig size cutoff (118,025 contigs) **
MetaBAT* | CONCOCT | GroopM^^ | MaxBin | |
---|---|---|---|---|
Number of Bins Identified (>200kb) | 190 | 260 | 335 | 168 |
Number of Quality Bins (Precision > .9 & Recall > .5) | 72 | 39 | 16 | 18 |
Wall Time (16 cores; 32 hyper-threads) | 03:31:38 | 82:19:53 | 12:19:12 | 06:49:39 |
Peak Memory Usage (for binning step) | 3.0G | 7G | 6.3G | 5.8G |
*Sensitive mode
^^GroopM without recruiting (chimeric bins removed).
**Details of 1.5kb results can be found here.
Preprocessing
- We generated de-novo assembly using Ray Meta.
- Using BBMap, bam files for each library were produced.
- All files are available to download here.
Generating depth files
- It took 10 minutes using 32 hyper-threads with peak memory consumption of 8GB. Here is the log.
- Files are available here.
#!bash #depth file for MetaBAT jgi_summarize_bam_contig_depths --outputDepth depth.txt --pairedContigs paired.txt *.bam #depth file for CONCOCT awk 'NR > 1 {for(x=1;x<=NF;x++) if(x == 1 || (x >= 4 && x % 2 == 0)) printf "%s", $x (x == NF || x == (NF-1) ? "\n":"\t")}' depth.txt > depth_concoct.txt #depth file for MaxBin cut -f1,3 depth.txt | tail -n+2 > depth_maxbin.txt
Running MetaBAT (using version >= 0.22.1)
#!bash #Prepare proper folder structure mkdir -p ./2.5kb/MetaBAT/Sensitive ./2.5kb/MetaBAT/Specific ./2.5kb/MetaBAT/SpecificPair #First, try sensitive mode to better sensitivity metabat -i assembly.fa -a depth.txt -o ./2.5kb/MetaBAT/Sensitive/bin --sensitive -v --saveTNF saved_2.5kb.tnf --saveDistance saved_2.5kb.gprob #Try specific mode to improve specificity further; this time the binning will be much faster since it reuses saved calculations metabat -i assembly.fa -a depth.txt -o ./2.5kb/MetaBAT/Specific/bin --specific -v --saveTNF saved_2.5kb.tnf --saveDistance saved_2.5kb.gprob #Try specific mode with paired data to improve sensitivity while minimizing the loss of specificity metabat -i assembly.fa -a depth.txt -p paired.txt -o ./2.5kb/MetaBAT/SpecificPair/bin --specific -v --saveTNF saved_2.5kb.tnf --saveDistance saved_2.5kb.gprob
Evaluation of MetaBAT using CheckM
#!bash checkm lineage_wf -f ./2.5kb/MetaBAT/Sensitive/SCG.txt -t 32 -x fa ./2.5kb/MetaBAT/Sensitive ./2.5kb/MetaBAT/Sensitive/SCG checkm lineage_wf -f ./2.5kb/MetaBAT/Specific/SCG.txt -t 32 -x fa ./2.5kb/MetaBAT/Specific ./2.5kb/MetaBAT/Specific/SCG checkm lineage_wf -f ./2.5kb/MetaBAT/SpecificPair/SCG.txt -t 32 -x fa ./2.5kb/MetaBAT/SpecificPair ./2.5kb/MetaBAT/SpecificPair/SCG
- The results are available to download here.
Print out the results
-
To reduce the bias which exaggerates precision when recall is very low (meaning the bin is very small), only bins having recall > 0.2 were considered for the calculation.
-
Overall the results looked very similar; interestingly, sensitive mode performed more than expected (its precision was not worse than others as the first benchmark. Sensitive mode will be selected for the comparison with others.
-
Indeed MetaBAT has the luxury of choosing the best parameter for a given data set due to its extremely fast binning performance. One can even optimize further the parameters to get the best results in terms of single copy genes. MetaBAT reuses pre-calculated data if the probability parameters (p1, p2, p3) are greater than or equal to minimum of them used for the saved file (in this example it was 80).
-
Recall and precision here correspond to completeness and 1 - contamination in CheckM table, respectively.
#The following is R commands (tested on Linux) #Download the R file from the data directory (see above). It will try to download and install required libraries. source('http://portal.nersc.gov/dna/RD/Metagenome_RD/MetaBAT/Files/benchmark.R') res <- list(Sensitive=calcPerfBySCG("./2.5kb/MetaBAT/Sensitive/SCG.txt"), Specific=calcPerfBySCG("./2.5kb/MetaBAT/Specific/SCG.txt"), SpecificPair=calcPerfBySCG("./2.5kb/MetaBAT/SpecificPair/SCG.txt")) printPerf(res) $Sensitive Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 106 89 75 60 43 28 11 2 0.8 105 88 74 59 42 27 10 1 0.9 89 72 58 44 28 17 7 0 0.95 65 48 35 24 13 8 3 0 0.99 17 9 4 2 2 2 0 0 $Specific Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 101 85 69 56 39 23 5 1 0.8 101 85 69 56 39 23 5 1 0.9 94 78 62 49 33 19 4 0 0.95 68 52 38 27 16 8 2 0 0.99 26 17 8 6 5 2 0 0 $SpecificPair Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 108 93 74 61 44 27 11 1 0.8 107 92 73 60 43 26 10 1 0.9 88 73 55 43 30 15 5 0 0.95 57 42 29 20 13 8 3 0 0.99 14 9 4 3 3 2 1 0
Plot the results
#!R pdf("Performance_By_SCG.pdf", width=8, height=8) plotPerf(res, xlim=max(sapply(res, nrow))) dev.off()
Running CONCOCT (using version 0.4.0)
#!bash concoct --composition_file assembly.fa --coverage_file depth_concoct.txt --length_threshold 2500
CONCOCT bins should be extracted first to calculate recall and precision.
system("mkdir -p ./2.5kb/CONCOCT/bins/ ./2.5kb/CONCOCT/small_bins") cls <- read.csv("./2.5kb/CONCOCT/clustering_gt2500.csv", header=F, as.is=T) invisible(foreach(i=unique(cls$V2)) %do% { write.table(cls$V1[cls$V2==i], file=sprintf("./2.5kb/CONCOCT/bins/%d.lst", i), col.names=F, row.names=F, quote=F) system(sprintf("./screen_list.pl ./2.5kb/CONCOCT/bins/%d.lst assembly.fa keep > ./2.5kb/CONCOCT/bins/%d.fa", i,i)) bin.size <- as.numeric(system(sprintf("./sizefasta.pl ./2.5kb/CONCOCT/bins/%d.fa", i), intern=T)) if(bin.size < 200000) system(sprintf("mv ./2.5kb/CONCOCT/bins/%d.fa ./2.5kb/CONCOCT/small_bins/", i)) })
#!bash checkm lineage_wf -f ./2.5kb/CONCOCT/SCG.txt -t 32 -x fa ./2.5kb/CONCOCT/bins ./2.5kb/CONCOCT/bins/SCG
res <- list(CONCOCT=calcPerfBySCG("./2.5kb/CONCOCT/SCG.txt")) printPerf(res) $CONCOCT Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 91 78 67 57 41 30 20 6 0.8 83 70 59 50 34 23 14 5 0.9 57 44 36 27 13 7 3 1 0.95 35 23 16 11 1 1 1 0 0.99 7 1 0 0 0 0 0 0
Running GroopM (using version 0.3.0)
- Parse step with 32 threads spent > 120GB memory, so the number of threads reduced to 16.
- We tried two modes with or without recruiting step.
- Overall GroopM performed poorly in terms of precision (large bins had significant amount of contaminations).
The results are available to download here.
#!bash #calculate depth file for GroopM groopm parse -t 16 database.gm assembly.fa *.bam #core binning groopm core -b 200000 -c 2500 database.gm #skipped refining stage since it is not automated #groopm refine database.gm #output core bins groopm extract -t 32 --prefix ./2.5kb/GroopM/core_only/bin_groopm ./2.5kb/GroopM/database.gm assembly.fa #recruiting unbinned contigs groopm recruit database.gm #output groopm extract -t 32 --prefix ./2.5kb/GroopM/recruited/bin_groopm ./2.5kb/GroopM/database.gm assembly.fa checkm lineage_wf -f ./2.5kb/GroopM/core_only/SCG.txt -t 32 -x fna ./2.5kb/GroopM/core_only ./2.5kb/GroopM/core_only/SCG checkm lineage_wf -f ./2.5kb/GroopM/recruited/SCG.txt -t 32 -x fna ./2.5kb/GroopM/recruited ./2.5kb/GroopM/recruited/SCG
res <- list(Core=calcPerfBySCG("./2.5kb/GroopM/core_only/SCG.txt"), Recruited=calcPerfBySCG("./2.5kb/GroopM/recruited/SCG.txt")) printPerf(res) $Core Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 54 42 31 24 11 9 4 0 0.8 48 36 25 18 6 5 2 0 0.9 41 29 20 14 3 2 0 0 0.95 30 20 13 10 3 2 0 0 0.99 8 5 2 1 0 0 0 0 $Recruited Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 80 63 43 34 21 10 5 3 0.8 66 49 30 24 13 7 3 1 0.9 40 27 17 12 5 3 1 1 0.95 15 12 8 6 1 1 0 0 0.99 1 1 1 1 0 0 0 0
Running MaxBin (using version 1.4.1)
#!bash run_MaxBin.pl -contig assembly.fa -out ./2.5kb/MaxBin/MaxBin.out -abund depth_maxbin.txt -thread 32 -min_contig_length 2500 checkm lineage_wf -f ./2.5kb/MaxBin/SCG.txt -t 32 -x fasta ./2.5kb/MaxBin ./2.5kb/MaxBin/SCG
res <- list(MaxBin=calcPerfBySCG("./2.5kb/MaxBin/SCG.txt")) printPerf(res) $MaxBin Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 72 52 40 23 9 3 1 0 0.8 55 36 26 18 7 2 1 0 0.9 32 18 10 9 2 1 0 0 0.95 19 12 7 7 1 1 0 0 0.99 5 2 0 0 0 0 0 0
Comparing all methods together
res <- list(MetaBAT=calcPerfBySCG("./2.5kb/MetaBAT/Sensitive/SCG.txt"), CONCOCT=calcPerfBySCG("./2.5kb/CONCOCT/SCG.txt"), GroopM=calcPerfBySCG("./2.5kb/GroopM/core_only/SCG.txt"), MaxBin=calcPerfBySCG("./2.5kb/MaxBin/SCG.txt")) printPerf(res) $MetaBAT Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 106 89 75 60 43 28 11 2 0.8 105 88 74 59 42 27 10 1 0.9 89 72 58 44 28 17 7 0 0.95 65 48 35 24 13 8 3 0 0.99 17 9 4 2 2 2 0 0 $CONCOCT Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 91 78 67 57 41 30 20 6 0.8 83 70 59 50 34 23 14 5 0.9 57 44 36 27 13 7 3 1 0.95 35 23 16 11 1 1 1 0 0.99 7 1 0 0 0 0 0 0 $GroopM Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 54 42 31 24 11 9 4 0 0.8 48 36 25 18 6 5 2 0 0.9 41 29 20 14 3 2 0 0 0.95 30 20 13 10 3 2 0 0 0.99 8 5 2 1 0 0 0 0 $MaxBin Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 72 52 40 23 9 3 1 0 0.8 55 36 26 18 7 2 1 0 0.9 32 18 10 9 2 1 0 0 0.95 19 12 7 7 1 1 0 0 0.99 5 2 0 0 0 0 0 0
pdf("Performance_By_SCG_All_2.5kb.pdf", width=8, height=8) plotPerf(res, xlim=max(sapply(res, nrow))) dev.off()
Conclusions
-
MetaBAT outperformed GroopM and MaxBin in terms of all metrics (similar outcome as the controlled benchmark).
-
CONCOCT had better recall (completeness) at the cost of reduced precision. MetaBAT excelled CONCOCT in both combined metrics, F1 and F0.5.
-
GroopM seemed too liberal to select members for each bin so that it suffered significantly in precision. The greater completeness of bins was driven by excessive including of contigs, which caused poor precision.
-
MaxBin performed reasonably well without using co-abundance information.
-
The conclusion is very similar to previous benchmark, that MetaBAT is the fastest metagenome binning software producing very little contamination in bins with reasonable completeness which is suitable characteristics for complex metagenomes analyses.
Updated