deleting bad mags

Issue #39 closed
david needham created an issue

Hi Simon and developers,

We see that there is a great way to add MAGs, but is it possible yet to remove MAG datasets that are well known to be pretty contaminated?

Best,

David Needham

Comments (2)

  1. Simon Roux repo owner

    Hi David !

    That’s a good point, we tried to do our best to deal with contaminated MAGs, but I’m sure there are still some problematic MAGs in the database unfortunately. There is no “official” way of removing a MAG from the default iPHoP database, however there is a “hack” you could try. In the database folder “db_infos”, there is a file called “Host_Genomes.tsv”, which is used by iPHoP to link genome to taxonomy. Any genome not listed in this file should be ignored at the time of host prediction, so one thing you could try is removing the MAGs you would like iPHoP to ignore from this file, and rerun. It’s not perfect because you will still waste time computing blast, crispr search, etc against these genomes, but they should not be used for host prediction. I tested it on a small example and this seems to work “as expected” (as in: I’m pretty sure this should not break anything, and I did not see anything broken).

    Let me know if you try this out how this worked for you, and happy to talk more about cleaning up the host database for a future version of iPHoP as well !

    Best,

    Simon

  2. Log in to comment