deleting bad mags
Issue #39
closed
Hi Simon and developers,
We see that there is a great way to add MAGs, but is it possible yet to remove MAG datasets that are well known to be pretty contaminated?
Best,
David Needham
Comments (2)
-
repo owner -
repo owner - changed status to closed
Answered
- Log in to comment
Hi David !
That’s a good point, we tried to do our best to deal with contaminated MAGs, but I’m sure there are still some problematic MAGs in the database unfortunately. There is no “official” way of removing a MAG from the default iPHoP database, however there is a “hack” you could try. In the database folder “db_infos”, there is a file called “Host_Genomes.tsv”, which is used by iPHoP to link genome to taxonomy. Any genome not listed in this file should be ignored at the time of host prediction, so one thing you could try is removing the MAGs you would like iPHoP to ignore from this file, and rerun. It’s not perfect because you will still waste time computing blast, crispr search, etc against these genomes, but they should not be used for host prediction. I tested it on a small example and this seems to work “as expected” (as in: I’m pretty sure this should not break anything, and I did not see anything broken).
Let me know if you try this out how this worked for you, and happy to talk more about cleaning up the host database for a future version of iPHoP as well !
Best,
Simon