Can't calculate distances
Hello,
I am trying to run iphop with a custom database and am getting the following error:
[7/4] Write the matrices for TensorFlow...
Starting to built the matrices for TensorFlow
Loading trees
Processing data for virus FakeDummy_AJ421943.1
Processing data for virus k141_1025169-cat_2
Can't find 3300005964_14 and/or semibin_bin.160_sub in the trees, so can't calculate distances
I’ve successfully run iphop with the default database before this, so I think it has something to do with my having added bins to my database, but I’m not sure how. This resulted in the run finishing abnormally and I don’t have the normal summary files. Any help is appreciated!
-Roo
Comments (6)
-
repo owner -
reporter Hi Simon,
Thanks so much for your quick response! Sorry I didn’t see the other posts about this issue. I removed those lines and it fixed my issue, but now I’m getting another error which I’m not seeing mentioned in other issues:
[7/1] Loading all parsed data... [7/2] Loading corresponding host taxonomy... Traceback (most recent call last): File "/bioware/iphop-mamba-20230802/bin/iphop", line 10, in <module> sys.exit(cli()) File "/bioware/iphop-mamba-20230802/lib/python3.8/site-packages/iphop/iphop.py", line 129, in cli args["func"](args) File "/bioware/iphop-mamba-20230802/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 102, in main dataprep.aggregate(args) File "/bioware/iphop-mamba-20230802/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 36, in aggregate load_taxo_repr(args,check_host,host_info) File "/bioware/iphop-mamba-20230802/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 325, in load_taxo_repr (genome,source,status,strain,gtdb_repr,tax) = row ValueError: not enough values to unpack (expected 6, got 1)
Any thoughts about what might be causing this? Note: I’m running in a fresh output directory and this is my command:
iphop predict --fa_file 04_DRAMV/top6/top6.fasta --out_dir 06_HOSTS/iphop_results/top6 --db_dir /users/rweed/Oct_2023_w_TR_hosts
I appreciate your help!
-Roo
-
repo owner Hi Roo,
So this looks like an issue in the Host_genomes.tsv file, we need to make sure all the rows in this tsv file have 6 columns. Can you confirm this is the case in the new database after you modified the file ?
Thanks,
Best,
Simon
-
reporter Hi Simon,
Fixed it! The issue was that I had incorrectly formatted the file as a .csv. Thanks so much for your help, I have it running now!
-Roo
-
repo owner Great ! Glad that you could find the issue, and thanks for the follow up
-
repo owner - changed status to closed
Solved
- Log in to comment
Hi,
Sorry about this issue. In previous versions of iPHoP, there were some cases where a bin could not be included in a custom database, yet was still included in some calculations so that eventually it caused this kind of error (see https://bitbucket.org/srouxjgi/iphop/issues/20/step-7-4-cant-calculate-distances). The things to check are:
- Do you find the bins “semibin_bin.160_sub” and “3300005964_14“ in the files ending in decorated.tree in “db_infos/” for your custom MAG database ?
- If one of these is missing from the tree file, that means GTDB-tk could not include it in the trees, at this point the easiest fix is to remove the corresponding line in “Host_genomes.tsv “ (i.e. the line starting with the bin ID “semibin_bin.160_sub” or “3300005964_14“)
Newer versions of iPHoP should not have this issue anymore, but since the problem starts at the “add_to_db” step, it’s often easier to fix is as described above rather than re-running everything.
Best,
Simon