Step [7/4] can't calculate distances

Issue #20 resolved
Ling-Dong Shi created an issue

Hi Simon,

Thanks for providing this powerful tool!

I met an error at the step 7 when using it with my curated database, please see the log below:

[7] Aggregating all results and formatting for TensorFlow...
[7/1] Loading all parsed data...
[7/2] Loading corresponding host taxonomy...
[7/3] Link matching genomes to representatives and filter out redundant / useless matches...
Filtering blast data
Filtering crispr data
Filtering wish data
Filtering vhm data
Filtering PHP data
[7/4] Write the matrices for TensorFlow...
Starting to built the matrices for TensorFlow
Loading trees
Processing data for virus Final_Ori_Adjusted_09mi20_z3_2019_ig18393_virus
Processing data for virus ref_AJ421943.1
Processing data for virus ref_CP017905.1
Can't find 400_ZONE1_2014_IG3401.contigs and/or SR-VP_0-2cm_seed_37_20.contigs in the trees, so can't calculate distances

Welcome to iPHoP

write

And after that, there was no final results like tsv file nor other errors reported. Could you please let me know how to address it? Thanks in advance!

Comments (8)

  1. Simon Roux repo owner

    Hi Ling-Dong,

    This looks like an issue at the database building step, as apparently some of your user-provided MAGs were not found in the newick trees. Could you share the two files ending with “decorated.tree” that should be in your database “db_infos” folder ?

  2. Ling-Dong Shi reporter

    Sorry for my late reply. Previous I used v1.1 and met the error listed above. Now I try the new version and everything looks good! Again thanks for providing such a powerful tool, Simon!

    One comment is that I noticed the step adding custom genomes into the database requires both gtdbtk.ar53.decorated.tree and gtdbtk.bac120.decorated.tree. But in some cases, for example, when people just have bacteiral members in the community or one just has interest in archaeal viruses, they have to add some references in their custom dataset before construcing a new iPHoP database, just like adding reference phages during the prediction by v1.1. Not sure if it is worth to be modified in the future versions…

  3. Simon Roux repo owner

    Good to hear, thanks for the update ! And yes, the old version of iPHoP required both gtdbtk.ar53.decorated.tree and gtdbtk.bac120.decorated.tree, but this should be fixed now in v1.2.0. We’ll definitely keep monitoring and make sure it still works with the different GTDB-tk versions.

  4. die hu

    Hi, Simon,

    Sorry to bother you again.

    But I also got a error at this step: [step 7/4]:

    Can't find GB_GCA_001515945.1 and/or LINA00548K_-bin.140 in the trees, so can't calculate distances.

    Actually, I checked the GB_GCA_001515945.1 is in the gtdbtk.bac120.decorated.tree file, the LINA00548K_-bin.140 is not there.

    How could I fix this problem?

    Kind regards,

    Die

  5. Simon Roux repo owner

    Hi Die,

    That is usually the sign of a problem with some of the custom MAGs that were attempted to be included in the database. The easiest way to solve it is usually to remove the corresponding line in the Host_genomes.tsv file (i.e. the line mentioning “LINA00548K_-bin.140”) and iPHoP will ignore it.

    Let me know if that works !

    Best,

    Simon

  6. Log in to comment