lost tips in the tree

Issue #55 closed
Zhichao Zhou created an issue

In the “gtdbtk de_novo_wf“ step for making bacterial and archaeal trees (please see the attached gtdbtk.log file), it seems that the final trees lost several GTDB genomes and user-added genomes. I have already down-regulated the default --min_perc_aa 10 to 0 in the command string, while there are still several genomes (including both GTDB reference genomes and my added genomes) lost in the trees (also attached - gtdbtk.ar122.decorated.tree and gtdbtk.bac120.decorated.tree).

I think this is the reason that caused such an error in the run_iPHoP.log file:

Processing data for virus 3300033816__vRhyme_10
Can't find GB_GCA_000402655.1 and/or ME2011-06-13_3300042898_group3_bin2 in the trees, so can't calculate distances

Is there a way to solve this issue?

Comments (5)

  1. Simon Roux repo owner

    Right, the error you see in iPHoP is because some of your MAGs are missing from the tree. These should be filtered out automatically, but sometimes this filtering fails (we are looking into why). In the meantime, you can relatively simply fix this by modifying the file “Host_Genomes.tsv” and remove all the lines with an empty field in “Repr_taxonomy” column (these should correspond to the MAGs you wanted to add but did not make it to the tree, including ME2011-06-13_3300042898_group3_bin2).

  2. Log in to comment