Filtering Results

Issue #65 new
Former user created an issue

Hello, I used KMA to identify antibiotic resistance genes (ARGs) in several thousand metagenomes and have been very happy with the performance (very fast!). Now that I have the results for each metagenome, I'm trying to decide how best to filter the tables to confidently call presence/absence of the genes.

It seems like p-value would be the most straightforward, but I'm not sure how it's calculated and can't find an explanation in the paper. Do you have another suggestion? For my work, I'm not terribly concerned about identifying exactly what gene it is (i.e I'm not trying to distinguish between QnrS1 and QnrS2), I'm looking more broadly at gene families or clusters.

Thank you for your valuable insight, Peter

Comments (1)

  1. Christian Brinch

    This is a problem that we are struggling with ourselves. I don’t think there is a consensus at the moment and it is not a trivial problem. Not only does it depend on coverage and identity of the reference; it also depends on the relative abundance of the gene in the sample as well as the completeness of the database and how close the reference is to its neighbours.

    In our group, we homology reduce the results into 90% identity clusters and arbitrarily disregard low-count clusters. I am not terribly fund of this approach, but it is the best we’ve got so far.

  2. Log in to comment