Wiki
Clone wikiMetaBAT / CAMI
Revisiting CAMI Challenge Data Set
MetaBAT 2 outperforms previous MetaBAT and other alternatives in both accuracy and computational efficiency . All are based on default parameters.
Prerequisites
The dataset is available here. Refer to paper for the details. All results are here.
Low Complexity Data Set
#!bash $ metabat2 -i CAMI_low_RL_S001__insert_270_GoldStandardAssembly.fasta.gz -a depth-low.txt -o MetaBATLow/bin -v [00:00:00] MetaBAT 2 (v2.10.2) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200. [00:00:01] Finished reading 19499 contigs and 1 coverages from depth.txt [00:00:01] Number of target contigs: 4367 of large (>= 2500) and 3823 of small ones (>=1000 & <2500). [00:00:05] Finished TNF calculation. [00:00:06] Finished Preparing TNF Graph Building [pTNF = 92.0; 2392 / 2500 (P = 94.96%)] [00:00:06] Finished Building TNF Graph (4146 vertices and 239946 edges) [12.4Gb / 251.8Gb] [00:00:06] Building SCR Graph and Binning (4046 vertices and 30893 edges) [P = 95.00%; 12.4Gb / 251.8Gb] [00:00:07] 97.43% (137794377 bases) of large (>=2500) and 0.00% (0 bases) of small (<2500) contigs were binned. 35 bins (137794377 bases in total) formed.
Check the result using R
> source('http://portal.nersc.gov/dna/RD/Metagenome_RD/MetaBAT/Files/benchmark.R') > printPerf(list(MetaBAT2=calcPerfCAMI("MetaBAT","MetaBATLow/bin",complexity='low'), MaxBin2=calcPerfCAMI("MaxBin","MaxBinLow/bin",complexity='low'), CONCOCT=calcPerfCAMI("CONCOCT","CONCOCT/low/clustering_gt1000.csv",complexity='low'), MyCC=calcPerfCAMI("MaxBin","MyCC/low/Cluster",complexity='low'), BinSanity=calcPerfCAMI("BinSanity","BinSanity/low/",complexity='low'), COCACOLA=calcPerfCAMI("CONCOCT","COCACOLA/low/result.csv",complexity='low'))) $MetaBAT2 $BinSanity Recall Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 25 23 23 21 18 18 14 11 0.7 20 18 16 13 12 12 12 11 0.8 24 22 22 21 18 18 14 11 0.8 16 14 12 9 9 9 9 8 0.9 23 22 22 21 18 18 14 11 0.9 16 14 12 9 9 9 9 8 0.95 22 21 21 20 17 17 14 11 0.95 9 9 7 5 5 5 5 4 0.99 22 21 21 20 17 17 14 11 0.99 6 6 5 4 4 4 4 3 $MaxBin2 $COCACOLA Recall Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 33 30 28 24 24 21 17 16 0.7 8 8 5 5 5 5 5 3 0.8 29 27 25 23 23 20 17 16 0.8 7 7 4 4 4 4 4 2 0.9 22 20 18 16 16 15 13 12 0.9 5 5 3 3 3 3 3 2 0.95 17 16 15 14 14 13 12 11 0.95 2 2 1 1 1 1 1 1 0.99 11 10 9 8 8 8 8 8 0.99 0 0 0 0 0 0 0 0 $CONCOCT $MyCC Recall Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 18 18 18 17 17 16 15 14 0.7 10 10 10 10 10 10 10 10 0.8 17 17 17 17 17 16 15 14 0.8 10 10 10 10 10 10 10 10 0.9 17 17 17 17 17 16 15 14 0.9 10 10 10 10 10 10 10 10 0.95 17 17 17 17 17 16 15 14 0.95 10 10 10 10 10 10 10 10 0.99 14 14 14 14 14 13 12 11 0.99 6 6 6 6 6 6 6 6 > plotPerf3(res, rec=seq(.5,.9,.1), legend.position=c(.95,.7))
- Two panels represents precision .95 and .90, respectively.
Medium Complexity Data Set
#!bash $ metabat2 -i CAMI_medium_GoldStandardAssembly.fasta.gz -a depth-medium.txt -o MetaBATMed/bin -v [00:00:00] MetaBAT 2 (v2.10.2) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200. [00:00:05] Finished reading 63447 contigs and 4 coverages from depth.txt [00:00:05] Number of target contigs: 13229 of large (>= 2500) and 10460 of small ones (>=1000 & <2500). [00:00:15] Finished TNF calculation. [00:00:18] Finished Preparing TNF Graph Building [pTNF = 88.0; 2386 / 2500 (P = 94.92%)] [00:00:21] Finished Building TNF Graph (12565 vertices and 790316 edges) [12.8Gb / 251.8Gb] [00:00:22] Building SCR Graph and Binning (11908 vertices and 93567 edges) [P = 95.00%; 12.8Gb / 251.8Gb] [00:00:22] 0.09% (450157 bases) of large (>=2500) contigs were re-binned out of small bins (<200000). [00:00:25] 96.82% (488490337 bases) of large (>=2500) and 6.49% (1079936 bases) of small (<2500) contigs were binned. 171 bins (489570273 bases in total) formed.
> printPerf(list(MetaBAT2=calcPerfCAMI("MetaBAT","MetaBATMed/bin",complexity='medium'), MaxBin2=calcPerfCAMI("MaxBin","MaxBinMed/bin",complexity='medium'), CONCOCT=calcPerfCAMI("CONCOCT","CONCOCT/medium/clustering_gt1000.csv",complexity='medium'), MyCC=calcPerfCAMI("MaxBin","MyCC/medium/Cluster",complexity='medium'), BinSanity=calcPerfCAMI("BinSanity","BinSanity/medium/",complexity='medium'), COCACOLA=calcPerfCAMI("CONCOCT","COCACOLA/medium/result.csv",complexity='medium'))) $MetaBAT2 $MyCC Recall Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 115 106 95 86 80 75 64 54 0.7 24 23 23 23 23 23 22 21 0.8 109 100 90 82 76 73 63 53 0.8 23 22 22 22 22 22 21 20 0.9 105 96 88 80 75 72 63 53 0.9 18 18 18 18 18 18 17 16 0.95 102 93 85 78 73 71 62 52 0.95 17 17 17 17 17 17 17 16 0.99 91 82 74 68 63 61 54 44 0.99 8 8 8 8 8 8 8 7 $MaxBin2 $BinSanity Recall Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 107 103 93 87 76 71 68 61 0.7 68 59 54 49 46 44 38 37 0.8 93 89 80 75 68 66 63 56 0.8 60 51 46 43 41 39 34 33 0.9 76 73 66 63 59 59 56 49 0.9 49 41 36 34 32 31 28 27 0.95 61 59 54 53 51 51 49 42 0.95 43 38 34 32 30 29 26 25 0.99 33 32 31 31 31 31 31 27 0.99 21 19 17 17 16 16 15 14 $CONCOCT $COCACOLA Recall Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 23 21 20 20 19 19 14 12 0.7 16 13 12 11 10 9 9 4 0.8 23 21 20 20 19 19 14 12 0.8 12 10 10 9 9 8 8 4 0.9 18 17 17 17 16 16 14 12 0.9 8 6 6 5 5 5 5 2 0.95 15 15 15 15 14 14 12 10 0.95 4 3 3 2 2 2 2 1 0.99 7 7 7 7 6 6 6 5 0.99 0 0 0 0 0 0 0 0 > plotPerf3(res, rec=seq(.5,.9,.1), legend.position=c(.95,.7))
High Complexity Data Set
#!bash $ metabat2 -i CAMI_high_GoldStandardAssembly.fasta.gz -a depth-high.txt -o MetaBATHigh/bin -v [00:00:00] MetaBAT 2 (v2.10.2) using minContig 2500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200. [00:00:24] Finished reading 42038 contigs and 5 coverages from depth.txt [00:00:24] Number of target contigs: 28615 of large (>= 2500) and 5547 of small ones (>=1000 & <2500). [00:01:08] Finished TNF calculation. [00:01:13] Finished Preparing TNF Graph Building [pTNF = 93.0; 2378 / 2500 (P = 95.12%)] [00:01:23] Finished Building TNF Graph (27182 vertices and 1567728 edges) [14.8Gb / 251.8Gb] [00:01:30] Building SCR Graph and Binning (26714 vertices and 433208 edges) [P = 95.00%; 14.9Gb / 251.8Gb] [00:01:30] 0.04% (960710 bases) of large (>=2500) contigs were re-binned out of small bins (<200000). [00:01:54] 98.10% (2517748867 bases) of large (>=2500) and 3.18% (281503 bases) of small (<2500) contigs were binned. 728 bins (2518030370 bases in total) formed.
> printPerf(list(MetaBAT2=calcPerfCAMI("MetaBAT","MetaBATHigh/bin",complexity='high'), MaxBin2=calcPerfCAMI("MaxBin","MaxBinHigh/bin",complexity='high'), CONCOCT=calcPerfCAMI("CONCOCT","CONCOCT/high/clustering_gt1000.csv",complexity='high'), MyCC=calcPerfCAMI("MaxBin","MyCC/high/Cluster",complexity='high'), BinSanity=calcPerfCAMI("BinSanity","BinSanity/high/",complexity='high'), COCACOLA=calcPerfCAMI("CONCOCT","COCACOLA/high/result.csv",complexity='high'))) $MetaBAT2 $MyCC Recall Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 495 473 452 433 410 392 357 290 0.7 2 2 2 2 2 2 2 0 0.8 482 461 440 424 403 387 352 287 0.8 2 2 2 2 2 2 2 0 0.9 469 449 428 414 393 378 346 282 0.9 2 2 2 2 2 2 2 0 0.95 446 428 407 395 376 362 333 270 0.95 2 2 2 2 2 2 2 0 0.99 397 379 362 353 337 325 300 241 0.99 2 2 2 2 2 2 2 0 $MaxBin2 $BinSanity Recall Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 279 276 270 268 260 249 224 179 0.7 221 211 204 201 196 192 185 172 0.8 267 264 259 258 251 244 220 176 0.8 196 188 181 178 174 172 166 154 0.9 250 248 244 243 238 231 211 169 0.9 156 150 144 142 138 137 133 125 0.95 224 223 220 220 217 212 195 155 0.95 124 118 112 111 107 106 103 98 0.99 156 156 155 155 155 152 144 114 0.99 69 67 65 65 65 65 63 62 $CONCOCT $COCACOLA Recall Recall Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 Precision 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 0.7 37 37 36 36 36 36 35 25 0.7 105 105 105 104 103 101 98 80 0.8 37 37 36 36 36 36 35 25 0.8 101 101 101 100 99 98 95 77 0.9 36 36 35 35 35 35 35 25 0.9 90 90 90 89 88 87 85 70 0.95 32 32 32 32 32 32 32 24 0.95 72 72 72 72 71 71 69 55 0.99 25 25 25 25 25 25 25 22 0.99 32 32 32 32 32 32 31 21 > plotPerf3(res, rec=seq(.5,.9,.1), legend.position=c(.95,.7))
Updated