Can't calculate distances

Issue #73 closed
Louise Weed created an issue

Hello,

I am trying to run iphop with a custom database and am getting the following error:

[7/4] Write the matrices for TensorFlow...

Starting to built the matrices for TensorFlow

Loading trees

Processing data for virus FakeDummy_AJ421943.1

Processing data for virus k141_1025169-cat_2

Can't find 3300005964_14 and/or semibin_bin.160_sub in the trees, so can't calculate distances

I’ve successfully run iphop with the default database before this, so I think it has something to do with my having added bins to my database, but I’m not sure how. This resulted in the run finishing abnormally and I don’t have the normal summary files. Any help is appreciated!

-Roo

Comments (6)

  1. Simon Roux repo owner

    Hi,

    Sorry about this issue. In previous versions of iPHoP, there were some cases where a bin could not be included in a custom database, yet was still included in some calculations so that eventually it caused this kind of error (see https://bitbucket.org/srouxjgi/iphop/issues/20/step-7-4-cant-calculate-distances). The things to check are:
    - Do you find the bins “semibin_bin.160_sub” and “3300005964_14“ in the files ending in decorated.tree in “db_infos/” for your custom MAG database ?
    - If one of these is missing from the tree file, that means GTDB-tk could not include it in the trees, at this point the easiest fix is to remove the corresponding line in “Host_genomes.tsv “ (i.e. the line starting with the bin ID “semibin_bin.160_sub” or “3300005964_14“)

    Newer versions of iPHoP should not have this issue anymore, but since the problem starts at the “add_to_db” step, it’s often easier to fix is as described above rather than re-running everything.

    Best,

    Simon

  2. Louise Weed reporter

    Hi Simon,

    Thanks so much for your quick response! Sorry I didn’t see the other posts about this issue. I removed those lines and it fixed my issue, but now I’m getting another error which I’m not seeing mentioned in other issues:

    [7/1] Loading all parsed data...
    
    [7/2] Loading corresponding host taxonomy...
    
    Traceback (most recent call last):
    
      File "/bioware/iphop-mamba-20230802/bin/iphop", line 10, in <module>
    
        sys.exit(cli())
    
      File "/bioware/iphop-mamba-20230802/lib/python3.8/site-packages/iphop/iphop.py", line 129, in cli
    
        args["func"](args)
    
      File "/bioware/iphop-mamba-20230802/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 102, in main
    
        dataprep.aggregate(args)
    
      File "/bioware/iphop-mamba-20230802/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 36, in aggregate
    
        load_taxo_repr(args,check_host,host_info)
    
      File "/bioware/iphop-mamba-20230802/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 325, in load_taxo_repr
    
        (genome,source,status,strain,gtdb_repr,tax) = row
    
    ValueError: not enough values to unpack (expected 6, got 1)
    

    Any thoughts about what might be causing this? Note: I’m running in a fresh output directory and this is my command:

    iphop predict --fa_file 04_DRAMV/top6/top6.fasta --out_dir 06_HOSTS/iphop_results/top6 --db_dir /users/rweed/Oct_2023_w_TR_hosts
    

    I appreciate your help!

    -Roo

  3. Simon Roux repo owner

    Hi Roo,

    So this looks like an issue in the Host_genomes.tsv file, we need to make sure all the rows in this tsv file have 6 columns. Can you confirm this is the case in the new database after you modified the file ?

    Thanks,

    Best,

    Simon

  4. Louise Weed reporter

    Hi Simon,

    Fixed it! The issue was that I had incorrectly formatted the file as a .csv. Thanks so much for your help, I have it running now!

    -Roo

  5. Log in to comment