Issue with add_to_db

Issue #102 closed
Timothy Rogers created an issue

Not sure if this is a simple error on my part or not, so forgive me a head of time if it is. I just installed iphop and downloaded the database two days ago. I am trying to add my own MAGs to the database, but I get the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/db_infos/gtdbtk.ar122.decorated.tree'

Welcome to iPHoP

There seems to be a database incompatibility: this is iPHoP v1.3.3, but the database provided seems to be from an older version. Please update your iPHoP database (the database name should end with '_rw').

When I look inside “/projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/db_infos/” the file is “gtdbtk.ar53.decorated.tree” which I believe is the most uptodate version. Not sure where to go from here…

Comments (11)

  1. Simon Roux repo owner

    Hi ! I think there may be a few things here, sorry. Are all these error messages coming when you try to run “add_to_db” ? Or are the latter when you try to run “predict” ?

    Also, are you running through Docker ?

  2. Timothy Rogers reporter

    Thanks for the quick response! Predict works well as I was able to use it to predict virus-host matches between my viral sequences and iphop’s host database. So it is only happening with the “add_to_db”. Also, I am not running it through Docker.

  3. Simon Roux repo owner

    Ok, I think I see what’s happening, this is the fix that is next on our todolist. In the meantime, you can copy “gtdbtk.ar53.decorated.tree” to “gtdbtk.ar122.decorated.tree” in the original database (/projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/db_infos/), and rerun add_to_db with a clean output folder, and the first error (“No such file or directory: '/projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/db_infos/gtdbtk.ar122.decorated.tree'“) should disappear. Then, using predict on the new database should also work better and not throw the “database incompatibility” error.
    let me know how this goes !

  4. Timothy Rogers reporter

    Ok, your suggestion worked to fix this first issue. However, I am running into another problem. Not sure if I should open a new issue or not, so Ill put it here and move it to a new one if you like.

    First, I added my MAGs to the database:

    #Make personal database:
    iphop add_to_db --fna_dir /projects/luo_lab/Rogers_SidersViralAnalysis_XXXX_20XX/data/processed/final_MAGs/metawrap_10_10_bins/ --gtdb_dir /projects/luo_lab/Rogers_SidersViralAnalysis_XXXX_20XX/data/processed/Taxonomy/MAG_Taxonomy_10_10_for_iphop/ --out_dir /projects/luo_lab/Databases/Siders_iphop_inhouse_db --db_dir /projects/luo_lab/Databases/iphop_db/Aug_2023_pub_rw/
    

    I then ran my viral sequences against the new MAG database:

    #Run on personal data based:
    iphop predict --fa_file /projects/luo_lab/Rogers_SidersViralAnalysis_XXXX_20XX/data/processed/Viral_Assemblies/vContig_VirSorter2_pass2/virsorter/final-viral-combined.fasta --db_dir /projects/luo_lab/Databases/Siders_iphop_inhouse_db/ --out_dir /projects/luo_lab/Rogers_SidersViralAnalysis_XXXX_20XX/data/processed/iPHop_VH_output/inhouse_db_match -t 32
    

    And now I am getting the following error:

    Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
    [1/1/Run] Running blastn against genomes...
    [1/3/Run] Get relevant blast matches...
    ### Welcome to iPHoP ###
    Traceback (most recent call last):
      File "/users/troger50/.conda/envs/iphop_env/bin/iphop", line 10, in <module>
        sys.exit(cli())
      File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
        args["func"](args)
      File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 77, in main
        blast_genomes.run_and_parse_blast_to_host(args)
      File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/blast_genomes.py", line 25, in run_and_parse_blast_to_host
        get_blast_results(args["blastparsed"],args["blastgenomeout"],args["removelist_contigs"],args["host_taxo_file"],args["thresholds"]["max_blast_hit"],logger,args['messages'],args['dummy_id'$
      File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/blast_genomes.py", line 63, in get_blast_results
        hostlist = utility.load_column_into_dict(taxo_file,0) # get the list of host we authorize, so we can catch any issue with contig id, etc
      File "/users/troger50/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/utility.py", line 39, in load_column_into_dict
        with open(in_file, "r", newline='') as f:
    FileNotFoundError: [Errno 2] No such file or directory: '/projects/luo_lab/Databases/Siders_iphop_inhouse_db/db_infos/Host_Genomes.tsv'
    

  5. Simon Roux repo owner

    This seems to be potentially related, as the custom database is apparently missing some file (“Host_Genomes.tsv” is not found, but should be created at the end of “add_to_db”). Did you get any error or warning when running add_to_db again ? Also, did you run it when there were already some files in “/projects/luo_lab/Databases/Siders_iphop_inhouse_db/” ? (If so, it may be worth trying to run add_to_db with a completely empty folder as the output).

    Can you also check the list of files currently in /projects/luo_lab/Databases/Siders_iphop_inhouse_db/db_infos/ ? I am wondering if Host_Genomes.tsv is the only one missing, or if it’s more than this (which may help us understand where the error occurred).

  6. Timothy Rogers reporter

    Yeah, if I had scrolled all the way to the end of the log file for the add_to_db step, I would have saved myself some time and noticed the time out error, lol. Thanks for your help. I will try to run the add_to_db step again with more time allotted on our hpc and see if it works. Sorry for the trouble. Will update soon.

  7. Timothy Rogers reporter

    Yeah, I was wondering why it was taking 2 days and nothing was happening lol. I just fixed it and now all my MAGs have been loaded to the data base. Thanks for your help!

  8. Simon Roux repo owner

    Nice, glad that you could solve the issue ! And one more good reason to fix this problem in the next release :-)

  9. Log in to comment