optimal use with MAGs/other bacterial genomes?

Dear all

‌

I have been using your excellent software to create abundance matrices using redundant gene databases like VFdb, NCBI AMR etc For these databases I have been using your recommendation, which is “-cge, -1t1”

‌

However, because of its high speed and efficiency I would also like to use KMA to create abundance matrices using a dereplicated set of MAGs (and at some point maybe also genomes of isolates).

For a previous study I used the illumina shotgun reads to assemble and create MAGs for all samples and then I used the dereplicated set of MAGs to create a reference set of “genomes“. Then I used the same parameters “-cge, 1t1“ to re-map the reads to the reference MAGs and the “fragmentCountAln“ to count reads properly aligned. Does that sound right?

Considering that the dereplicated MAGs are relative short contigs (illumina) and the database is still redundant, I continued using the “-cge, 1t1“ but should I change that? maybe should i use the “-Mt1” and drop the Conclave?

‌

Finally, for MAGs abundance matrices, i usually normalize across samples by using the Silva 16S to count how many reads map to the 16S (again using KMA) and taking the sum for each sample. However, I recently heard that there may be better ways to normalize instead of using the Silva 16S - any suggestions?

‌

Thanks in advance

P

Comments (7)