MAGs for custom host prediction decontamianton issue

Issue #81 closed
Zhichao Zhou created an issue

Hi,

As indicated in your ReadMe page about the custom MAG use, you mentioned that we will need to filter mistakenly binned viral contigs in our input MAGs. Do you have any ideas on this part? Or do you want to implement this part in an automated way in the future?

Comments (5)

  1. Simon Roux repo owner

    Hi ! This is (kind of) already implemented in iPHoP: if one of your virus sequence (provided as input) matches a contig in a host genome (user-provided or from a reference database) on more than 50% of the host genome contig’s length, this hit is ignored in iPHoP calculation. And to be more specific: all blast hits are first screened to detected any host genome contig covered by a viral contig on > 50% of its length and add these to a list of contigs to ignore. Now if you want to go further, my suggestion would be to run a viral contig prediction tool (e.g. geNomad, or VIBRANT) on your MAGs first, and remove from the fasta file of your MAGs any contig detected as viral by this tool(s) before building the iPHoP custom database. Hope this helps !

  2. Zhichao Zhou reporter

    Hi,

    This is a good idea!

    I will use geNomad to screen any mistakenly binned contigs in my input genomes. Then, what is your suggestion for any provirus-containing contigs identified? I should keep them, right? My understanding is that because those provirus-containing contigs are exactly the target for viruses to be matched through the blastn method, right?

  3. Simon Roux repo owner

    I agree, I would tend to keep the proviruses, except if they are predicted to span across almost all the contig (e.g. if you have a 25kb contig with a 22kb provirus predicted, I would still remove it because I would not trust it is correctly binned).

  4. Log in to comment