Frequently Asked Questions
Blastn or BowTie2? Which one should I use?
Short answer: if computational performance is the bottleneck of your analysis you should use BowTie2, if you want to maximize accuracy it doesn't really matter because Blastn and BowTie2 provide very similar predictions.
Here is a more verbose answer in case the computational performance is not the main factor to consider in your analysis. Based on our ten synthetic datasets, bowtie2 and blastn perform almost equally in terms of accuracy. Sometimes blastn is slightly better than bowtie2, but other times the opposite is true. Bowtie2 with '--bt2_ps very-sensitive-local' (the default) or '--bt2_ps sensitive-local' seems to be on average a little more accurate than blastn producing however some more false positives. On the other hand, Bowtie2 with '--bt2_ps very-sensitive' (the default) or '--bt2_ps sensitive' is a bit less accurate than blastn but with fewer false positives. The false positives / false negatives trade-off can also be tuned using the '--stat_q' option (that works for both blastn and BowTie2) that we suggest to set higher than 0.1 (but smaller than 0.33) if one wants to avoid as much as possible false positives at the price of having some false negatives. Currently, we profiled 1,000 real metagenomes using blastn (because we didn't add the BowTie2 option yet), and just few metagenomes with BowTie2; until a more comprehensive analysis is performed, it may be a bit safer to use blastn instead of BowTie2 if the accuracy is much more important than computational efficiency.
Should I modify the --stat_q parameters? in case, how?
The stat_q parameter regulates the truncated average computation and is set to 0.1 meaning that the 10% least abundant and the 10% most abundant markers are removed before computing the relative abundances. In the great majority of the cases the value of 0.1 is close to the optimal value for avoiding false positives and false negatives so there is generally no need to modify it.
If you really want to set a non-default value for --stat_q value, you can run MetaPhlAn with '-t clade_profiles' instead of the default '-t rel_ab'. This will generate a "coverage profile" for each marker of each clade. If several clades of interest (i.e. those appearing in the standard results) have markers with more than 10% of zeros you should use a value for --stat_q higher than 0.1. However, if you have many zeros and only few non-zero values, it's more likely than the non-zero values are false positives and there is no need to increase --stat_q. Because of taxonomic levels with only one descendant and other properties of the markers, this procedure is not rigorous for all taxonomic clades, but eyeballing some clades should be sufficient for suggesting whether --stat_q needs to be increased or not.
Can I use a local installation of BlastN or BowTie2 in MetaPhlAn?
Yes, starting from version 1.6.0 we added two options that let you specify the executables for BlastN and BowTie2 with global paths. The two options are --blastn_exe and --bowtie2_exe
I have a bunch of colorspace reads... is it possible to use bowtie1 for alignment?
The easiest thing to do would be to use bowtie1 externally from MetaPhlAn to map the colorspace reads against the MetaPhlAn database. You can then use the mapping output as input for MetaPhlAn. The mapping output should be a two-columns tab-separated file with the name of the reads in the first column and the corresponding MetaPhlAn gene ID in the second column.