Docker version doesn't work with MAG added db

Issue #63 closed
Anto created an issue

Cool tool and thank you for bringing this out to us in the community. I had success in using the tool via docker when the default database “Sept_2021_pub_rw” was used but not when I tested with a MAG added db. Below are the command-lines that I used and the error that I received. Any help here would be appreciated. V4_Wetland_MAGs_GTDB-tk_results contained two bacterial MAGs from my study and 1 archaeal MAG from your Wetland MAG set. My iphop_db directory contains “Sept_2021_w_V4_Wetland_MAGs_pub_rw” and “Sept_2021_pub_rw” and nothing was moved or deleted.

docker run --rm -v /data/V4_Wetland_MAGs_GTDB-tk_results/:/data/V4_Wetland_MAGs_GTDB-tk_results/:rw -v /data/iphop_db/:/data/iphop_db/:rw -v /data/V4_Wetland_bins4iphop/:/data/V4_Wetland_bins4iphop/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest add_to_db --fna_dir /data/V4_Wetland_bins4iphop/ --gtdb_dir /data/V4_Wetland_MAGs_GTDB-tk_results/ --out_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --db_dir /data/iphop_db/Sept_2021_pub_rw/

#We added 2 additional bacteria genomes and 1 additional archaea genomes
#[9] All done

docker run --rm -v /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw:/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/:rw -v /data/MAGincl_iphop_results:/MAGincl_iphop_results/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest predict --fa_file /MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /MAGincl_iphop_results/

Welcome to iPHoP

There seems to be a database incompatibility: this is iPHoP v1.3.1, but the database provided seems to be from an older version. Please update your iPHoP database (the database name should end with '_rw').

ERROR conda.cli.main_run:execute(49): conda run iphop predict --fa_file /MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /MAGincl_iphop_results/ failed. (See above for error)

Comments (13)

  1. Simon Roux repo owner

    Hi !

    Can you check the content of “/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/“ and “/data/iphop_db/Sept_2021_pub_rw/db/“ ? Essentially this error is thrown by iPHoP when it can’t find “rewish_models” in “/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/“. To save space, “add_to_db” attempts to create a symbolic link to this directory in the new db (rather than copy pasting the whole folder), however I wonder if this step failed (and Docker did not tell us ?).

    If the new database looks good except the symbolic link is broken / not there, you can copy the whole rewish_models folder from Sept_2021_pub_rw/db/ into Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/ , and the error should disappear (then we have to cross our fingers that this was the only issue with “add_to_db”..)

    Let me know how this goes !

    Best,

    Simon

  2. Anto reporter

    Many thanks for your prompt response. I cannot perform a copy operation since the error message tells me that they are the same files. Below is how my database directories look. I tried copying wish_data from the original db to the new MAG-added-db since it was missing from the latter but while attempting to run prediction on Docker the error “There seems to be a database incompatibility: this is iPHoP v1.3.1, but the database provided seems to be from an older version. Please update your iPHoP database (the database name should end with '_rw')” still comes up.

    antonycp@kw13:/data/iphop_db/Sept_2021_pub_rw/db$ ls -ltr
    total 2365056
    -rwxr-xr-x 1 antonycp g-antonycp      20480 Mar 31 22:15 All_CRISPR_spacers_nr_clean.ndb
    -rwxr-xr-x 1 antonycp g-antonycp  112267777 Mar 31 22:15 All_CRISPR_spacers_nr_clean.fna
    -rwxr-xr-x 1 antonycp g-antonycp    5592524 Mar 31 22:15 All_CRISPR_spacers_nr_clean.nto
    -rwxr-xr-x 1 antonycp g-antonycp      16384 Mar 31 22:15 All_CRISPR_spacers_nr_clean.ntf
    -rwxr-xr-x 1 antonycp g-antonycp   13186381 Mar 31 22:15 All_CRISPR_spacers_nr_clean.nsq
    -rwxr-xr-x 1 antonycp g-antonycp   16777568 Mar 31 22:15 All_CRISPR_spacers_nr_clean.not
    -rwxr-xr-x 1 antonycp g-antonycp   16777700 Mar 31 22:15 All_CRISPR_spacers_nr_clean.nin
    -rwxr-xr-x 1 antonycp g-antonycp  150024026 Mar 31 22:15 All_CRISPR_spacers_nr_clean.nhr
    -rwxr-xr-x 1 antonycp g-antonycp 1967934427 Mar 31 22:15 GTDBtkr202_and_newrepr_s2_mat.pkl
    drwxr-xr-x 2 antonycp g-antonycp      12288 Mar 31 22:16 Host_Genomes
    drwxr-xr-x 2 antonycp g-antonycp       4096 Mar 31 22:18 rafah_data
    drwxr-xr-x 3 antonycp g-antonycp       4096 Mar 31 22:20 wish_data
    drwxr-xr-x 2 antonycp g-antonycp       4096 Mar 31 22:24 rewish_models
    -rwxr-xr-x 1 antonycp g-antonycp  139171706 Apr  1 00:19 php_db 
    
    antonycp@kw13:/data/iphop_db/Sept_2021_pub_rw/db$ ls -ltr ../../Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/
    total 2057472
    lrwxrwxrwx 1 antonycp g-antonycp         45 Aug 31 09:05 rafah_data -> /data/iphop_db/Sept_2021_pub_rw/db/rafah_data
    drwxr-xr-x 2 antonycp g-antonycp       4096 Aug 31 09:05 Host_Genomes
    drwxr-xr-x 5 antonycp g-antonycp       4096 Aug 31 09:05 Tmp_CRISPRs
    lrwxrwxrwx 1 antonycp g-antonycp         66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.nto -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.nto
    lrwxrwxrwx 1 antonycp g-antonycp         66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.not -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.not
    lrwxrwxrwx 1 antonycp g-antonycp         66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.ntf -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.ntf
    lrwxrwxrwx 1 antonycp g-antonycp         66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.nsq -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.nsq
    lrwxrwxrwx 1 antonycp g-antonycp         66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.nhr -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.nhr
    lrwxrwxrwx 1 antonycp g-antonycp         66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.fna -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.fna
    lrwxrwxrwx 1 antonycp g-antonycp         66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.ndb -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.ndb
    lrwxrwxrwx 1 antonycp g-antonycp         66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.nin -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.nin
    drwxr-xr-x 2 antonycp g-antonycp       4096 Aug 31 09:08 rewish_models_extra
    drwxr-xr-x 3 antonycp g-antonycp       4096 Aug 31 09:14 rewish_tmp
    lrwxrwxrwx 1 antonycp g-antonycp         48 Aug 31 09:14 rewish_models -> /data/iphop_db/Sept_2021_pub_rw/db/rewish_models
    -rwxr-xr-x 1 antonycp g-antonycp 1967613666 Aug 31 09:14 GTDBtkr202_and_newrepr_s2_mat.pkl
    drwxr-xr-x 2 antonycp g-antonycp       4096 Aug 31 09:14 php_models_extra
    -rwxr-xr-x 1 antonycp g-antonycp  139171706 Aug 31 09:14 php_db
    drwxr-xr-x 3 antonycp g-antonycp       4096 Sep  3 08:33 wish_data
    

  3. Simon Roux repo owner

    Right, wish_data is not needed in the new db (it’s only used at the “add_to_db” stage), so it is not copied over. I would suggest maybe removing the current link, and copying “rewish_models” over, i.e.:

    $ rm /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rewish_models
    $ cp -r /data/iphop_db/Sept_2021_pub_rw/db/rewish_models /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/
    

  4. Anto reporter

    Many thanks again for your response. Unfortunately, I’m going around in circles here. After executing your above command, I got a new error “FileNotFoundError: [Errno 2] No such file or directory: '/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db_infos/All_CRISPR_spacers_nr_clean.metrics.csv'“ which I got around by removing these CRISPR files from the MAG db and copying them from Sept_2021_pub_rw. Then I got the next error “No such file or directory: '/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsv'“_ which I tried to get around by performing the same rm operation to the RaFAH_ref* files at the MAG_db dir but this somehow resulted in the removal of these files from the original Sept_2021_pub_rw too (I can confirm that I did not directly execute this rm operation at Sept_2021_pub_rw). Now it seemed like I would need to build back this original database and so I deleted the current Sept_2021_pub_rw and tried to do a fresh download and I got a new error that I didn’t get when I had downloaded the db the first time around. I’m left scratching my head now😧

    antonycp@kw13:/data$ docker run --rm -v /data/iphop_new_db:/data/iphop_new_db/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest download --db_dir /data/iphop_new_db/
    Please confirm you are ready to download the iPHoP database now. [y/N]:

    
    
    
    

    WARNING

    iPHoP database is pretty big, and will require around ~ 350Gb of disk space and some time to download and set up (except if you are downloading the test database, in which case it's only ~ 5Gb). Are you sure you want to continue ?
    ################
    Traceback (most recent call last):
    File "/opt/conda/envs/iphop/bin/iphop", line 10, in <module>
    sys.exit(cli())
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
    args"func"
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/master_downloader.py", line 40, in main
    response = single_yes_or_no_question("Please confirm you are ready to download the iPHoP database now. ",args["no_prompt"])
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/master_downloader.py", line 204, in single_yes_or_no_question
    reply = str(input(question + choices)).lower().strip() or default_answer
    EOFError: EOF when reading a line

    ERROR conda.cli.main_run:execute(49): conda run iphop download --db_dir /data/iphop_new_db/ failed. (See above for error)

  5. Anto reporter

    Update- the db download seems to be running when I added the “-n” flag at the end. I’ll post here how everything goes with the rest of the testing with the MAG added db

  6. Anto reporter

    I removed and re-copied RaFAH_ref* files to the MAG db and yet I get the error “No such file or directory: '/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsv'“ even though the file exists. See below log

    docker run --rm -v /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw:/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/:rw -v /data/MAGincl_iphop_results:/MAGincl_iphop_results/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest predict --fa_file /MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /MAGincl_iphop_results/

    
    

    Welcome to iPHoP

    Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
    [1/1/Skip] Skipping computation of blastn against microbial genomes...
    [1/3/Skip] Skipping blast parsing...
    [2/1/Skip] Skipping computation of blastn against CRISPR...
    [2/2/Skip] Skipping crispr parsing...
    [3/1/Skip] Skipping computation of WIsH scores...
    [3/2/Skip] Skipping WIsH parsing...
    [4/1/Skip] Skipping computation of VHM s2 similarities...
    [4/2/Skip] Skipping VHM parsing...
    [5/1/Skip] Skipping computation of PHP scores...
    [5/2/Skip] Skipping PHP parsing...
    [6/1/Skip] Skipping RaFAH...
    [6/2/Skip] Skipping RaFAH parsing...
    [6.5/1/Run] Running Diamond comparison to RaFAH references...
    [6.5/2/Run] Get AAI distance to RaFAH refs...
    Traceback (most recent call last):
    File "/opt/conda/envs/iphop/bin/iphop", line 10, in <module>
    sys.exit(cli())
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
    args"func"
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 95, in main
    aai_to_ref.run_and_parse_aai(args)
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/aai_to_ref.py", line 30, in run_and_parse_aai
    get_aai_results(faa_file,db_info,args["aai_out"],args["aai_parsed"])
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/aai_to_ref.py", line 67, in get_aai_results
    with open(ref) as csvfile:
    FileNotFoundError: [Errno 2] No such file or directory: '/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsv'

    ERROR conda.cli.main_run:execute(49): conda run iphop predict --fa_file /MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /MAGincl_iphop_results/ failed. (See above for error)
    antonycp@kw13:/data$ head -n2 /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsv
    NC_000866 265
    NC_000867 19
    antonycp@kw13:/data$ head -n2 /data/iphop_db/Sept_2021_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsv

    NC_000866 265
    NC_000867 19

  7. Simon Roux repo owner

    I suspect there is still a symbolic link error, can you try:

    $ ls -lh /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/
    

    If there are still arrows, i.e. “rafah_data -> /…”, then the way to solve this will be:

    • remove the “rafah_data” link (“rm /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data”)
    • copy the entire directory from the previous db (“cp -r /data/iphop_db/Sept_2021_pub_rw/db/rafah_data /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/”)

    If this is what you already did, then it may be a permission issue ?

  8. Anto reporter

    Thanks; this made that error gave away but a new one relating to the output directory has popped up ☹ (Quick new update-I’m rerunning this now after deleting the outdir MAGincl_iphop_results from the previous runs. Perhaps the intermediate files that were created from the previous runs are messing up things here?)

    antonycp@kw13:/data$ docker run --rm -v /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw:/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/:rw -v /data/MAGincl_iphop_results:/data/MAGincl_iphop_results/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest predict --fa_file /data/MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /data/MAGincl_iphop_results/

    Welcome to iPHoP

    
    

    Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
    [1/1/Skip] Skipping computation of blastn against microbial genomes...
    [1/3/Skip] Skipping blast parsing...
    [2/1/Skip] Skipping computation of blastn against CRISPR...
    [2/2/Skip] Skipping crispr parsing...
    [3/1/Skip] Skipping computation of WIsH scores...
    [3/2/Skip] Skipping WIsH parsing...
    [4/1/Skip] Skipping computation of VHM s2 similarities...
    [4/2/Skip] Skipping VHM parsing...
    [5/1/Skip] Skipping computation of PHP scores...
    [5/2/Skip] Skipping PHP parsing...
    [6/1/Skip] Skipping RaFAH...
    [6/2/Skip] Skipping RaFAH parsing...
    [6.5/1/Skip] Skipping diamond search against RaFAH refs...
    [6.5/2/Skip] Skipping calculation of AAI to RaFAH refs...
    [7/Skip] We already found all the expected files, we skip...
    [7.5/Skip] We already found all the expected files, we skip...
    [8] Running the convolution networks...
    [8/1] Loading data as tensors..
    [8/1.1] Getting blast-based scores..
    [8/1.2/Skip] All blast-based results already here, we can skip..
    [8/2.1] Getting CRISPR-based scores..
    [8/2.2/Skip] All crispr-based results already here, we can skip..
    [8/3.1] Getting WIsH-based scores..
    [8/3.2/Skip] All WIsH-based results already here, we can skip..
    [8/4.1] Getting VHM-based scores..
    [8/4.2/Skip] All VHM-based results already here, we can skip..
    [8/5.1] Getting PHP-based scores..
    [8/5.2/Skip] All PHP-based results already here, we can skip..
    TF Parameter Server distributed training not available (this is expected for the pre-build release).
    [9] Running the aggregation models...
    [9/1/Skip] Skipping the aggregation step because we already have the output file /data/MAGincl_iphop_results/Wdir/All_scores_iPHoP_by_instance.csv
    [9/2] Combining all results (Blast, CRISPR, iPHoP, and RaFAH) in a single file: /data/MAGincl_iphop_results/Wdir/All_combined_scores.csv
    Traceback (most recent call last):
    File "/opt/conda/envs/iphop/bin/iphop", line 10, in <module>
    sys.exit(cli())
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
    args"func"
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 110, in main
    runaggregatormodel.run_model(args)
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/runaggregatormodel.py", line 65, in run_model
    merged = merge_all_results(args)
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/runaggregatormodel.py", line 238, in merge_all_results
    rafah_results = rafah.filter_rafah(rafah_results,args)
    File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/rafah.py", line 181, in filter_rafah
    with open(rafah_full_clusters) as f:
    FileNotFoundError: [Errno 2] No such file or directory: '/data/MAGincl_iphop_results/Wdir/rafah_out/Full_Genome_to_OG_Score_Min_Score_50-Max_evalue_1e-05_Prediction.tsv'

    ERROR conda.cli.main_run:execute(49): conda run iphop predict --fa_file /data/MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /data/MAGincl_iphop_results/ failed. (See above for error)
    antonycp@kw13:/data$ ls -ltr /data/MAGincl_iphop_results/Wdir/rafah_out/

    total 792
    -rw-r--r-- 1 antonycp g-antonycp 254689 Sep 4 09:04 Full_Genomes_Prediction.fasta
    -rw-r--r-- 1 antonycp g-antonycp 105867 Sep 4 09:04 Full_CDS_Prediction.gff
    -rw-r--r-- 1 antonycp g-antonycp 300022 Sep 4 09:04 Full_CDS_Prediction.fna
    -rw-r--r-- 1 antonycp g-antonycp 143356 Sep 4 09:04 Full_CDS_Prediction.faa

  9. Anto reporter

    Phew! Successful finally after deleting the output dir from the previous run. Thanks a zillion again for your patience and your valuable time.

  10. Simon Roux repo owner

    Thanks for persisting :-) And sorry the Docker version does not behave nicely. I will add this information to the README for other Docker users who may want to build and use a custom database.

  11. Log in to comment