Issue with add_to_db
Not sure if this is a simple error on my part or not, so forgive me a head of time if it is. I just installed iphop and downloaded the database two days ago. I am trying to add my own MAGs to the database, but I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: '/projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/db_infos/gtdbtk.ar122.decorated.tree'
Welcome to iPHoP
There seems to be a database incompatibility: this is iPHoP v1.3.3, but the database provided seems to be from an older version. Please update your iPHoP database (the database name should end with '_rw').
When I look inside “/projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/db_infos/” the file is “gtdbtk.ar53.decorated.tree” which I believe is the most uptodate version. Not sure where to go from here…
Comments (11)
-
repo owner -
reporter Thanks for the quick response! Predict works well as I was able to use it to predict virus-host matches between my viral sequences and iphop’s host database. So it is only happening with the “add_to_db”. Also, I am not running it through Docker.
-
repo owner Ok, I think I see what’s happening, this is the fix that is next on our todolist. In the meantime, you can copy “gtdbtk.ar53.decorated.tree” to “gtdbtk.ar122.decorated.tree” in the original database (/projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/db_infos/), and rerun add_to_db with a clean output folder, and the first error (“No such file or directory: '/projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/db_infos/gtdbtk.ar122.decorated.tree'“) should disappear. Then, using predict on the new database should also work better and not throw the “database incompatibility” error.
let me know how this goes ! -
reporter Ok, thank you. Trying that now and will let you know if that fixed the problem.
-
reporter Ok, your suggestion worked to fix this first issue. However, I am running into another problem. Not sure if I should open a new issue or not, so Ill put it here and move it to a new one if you like.
First, I added my MAGs to the database:
#Make personal database: iphop add_to_db --fna_dir /projects/luo_lab/Rogers_SidersViralAnalysis_XXXX_20XX/data/processed/final_MAGs/metawrap_10_10_bins/ --gtdb_dir /projects/luo_lab/Rogers_SidersViralAnalysis_XXXX_20XX/data/processed/Taxonomy/MAG_Taxonomy_10_10_for_iphop/ --out_dir /projects/luo_lab/Databases/Siders_iphop_inhouse_db --db_dir /projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/
I then ran my viral sequences against the new MAG database:
#Run on personal data based: iphop predict --fa_file /projects/luo_lab/Rogers_SidersViralAnalysis_XXXX_20XX/data/processed/Viral_Assemblies/vContig_VirSorter2_pass2/virsorter/final-viral-combined.fasta --db_dir /projects/luo_lab/Databases/Siders_iphop_inhouse_db/ --out_dir /projects/luo_lab/Rogers_SidersViralAnalysis_XXXX_20XX/data/processed/iPHop_VH_output/inhouse_db_match -t 32
And now I am getting the following error:
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves [1/1/Run] Running blastn against genomes... [1/3/Run] Get relevant blast matches... ### Welcome to iPHoP ### Traceback (most recent call last): File "/users/troger50/.conda/envs/iphop_env/bin/iphop", line 10, in <module> sys.exit(cli()) File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli args["func"](args) File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 77, in main blast_genomes.run_and_parse_blast_to_host(args) File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/blast_genomes.py", line 25, in run_and_parse_blast_to_host get_blast_results(args["blastparsed"],args["blastgenomeout"],args["removelist_contigs"],args["host_taxo_file"],args["thresholds"]["max_blast_hit"],logger,args['messages'],args['dummy_id'$ File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/blast_genomes.py", line 63, in get_blast_results hostlist = utility.load_column_into_dict(taxo_file,0) # get the list of host we authorize, so we can catch any issue with contig id, etc File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/utility.py", line 39, in load_column_into_dict with open(in_file, "r", newline='') as f: FileNotFoundError: [Errno 2] No such file or directory: '/projects/luo_lab/Databases/Siders_iphop_inhouse_db/db_infos/Host_Genomes.tsv'
-
repo owner This seems to be potentially related, as the custom database is apparently missing some file (“Host_Genomes.tsv” is not found, but should be created at the end of “add_to_db”). Did you get any error or warning when running add_to_db again ? Also, did you run it when there were already some files in “/projects/luo_lab/Databases/Siders_iphop_inhouse_db/” ? (If so, it may be worth trying to run add_to_db with a completely empty folder as the output).
Can you also check the list of files currently in /projects/luo_lab/Databases/Siders_iphop_inhouse_db/db_infos/ ? I am wondering if Host_Genomes.tsv is the only one missing, or if it’s more than this (which may help us understand where the error occurred).
-
reporter Yeah, if I had scrolled all the way to the end of the log file for the add_to_db step, I would have saved myself some time and noticed the time out error, lol. Thanks for your help. I will try to run the add_to_db step again with more time allotted on our hpc and see if it works. Sorry for the trouble. Will update soon.
-
repo owner Sounds good. Just in case, make sure that you don’t have anything else than MAG fasta files in the input folder, especially not other (sub-)folders (iPHoP does not catch this correctly, but then will get stuck - see https://bitbucket.org/srouxjgi/iphop/issues/97/error-when-6-add-new-genomes-to-vhm).
-
reporter Yeah, I was wondering why it was taking 2 days and nothing was happening lol. I just fixed it and now all my MAGs have been loaded to the data base. Thanks for your help!
-
repo owner Nice, glad that you could solve the issue ! And one more good reason to fix this problem in the next release :-)
-
repo owner - changed status to closed
- Log in to comment
Hi ! I think there may be a few things here, sorry. Are all these error messages coming when you try to run “add_to_db” ? Or are the latter when you try to run “predict” ?
Also, are you running through Docker ?