- edited description
Docker version doesn't work with MAG added db
Cool tool and thank you for bringing this out to us in the community. I had success in using the tool via docker when the default database “Sept_2021_pub_rw” was used but not when I tested with a MAG added db. Below are the command-lines that I used and the error that I received. Any help here would be appreciated. V4_Wetland_MAGs_GTDB-tk_results contained two bacterial MAGs from my study and 1 archaeal MAG from your Wetland MAG set. My iphop_db directory contains “Sept_2021_w_V4_Wetland_MAGs_pub_rw” and “Sept_2021_pub_rw” and nothing was moved or deleted.
docker run --rm -v /data/V4_Wetland_MAGs_GTDB-tk_results/:/data/V4_Wetland_MAGs_GTDB-tk_results/:rw -v /data/iphop_db/:/data/iphop_db/:rw -v /data/V4_Wetland_bins4iphop/:/data/V4_Wetland_bins4iphop/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest add_to_db --fna_dir /data/V4_Wetland_bins4iphop/ --gtdb_dir /data/V4_Wetland_MAGs_GTDB-tk_results/ --out_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --db_dir /data/iphop_db/Sept_2021_pub_rw/
#We added 2 additional bacteria genomes and 1 additional archaea genomes
#[9] All done
docker run --rm -v /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw:/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/:rw -v /data/MAGincl_iphop_results:/MAGincl_iphop_results/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest predict --fa_file /MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /MAGincl_iphop_results/
Welcome to iPHoP
There seems to be a database incompatibility: this is iPHoP v1.3.1, but the database provided seems to be from an older version. Please update your iPHoP database (the database name should end with '_rw').
ERROR conda.cli.main_run:execute(49): conda run iphop predict --fa_file /MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /MAGincl_iphop_results/ failed. (See above for error)
Comments (13)
-
reporter -
repo owner Hi !
Can you check the content of “/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/“ and “/data/iphop_db/Sept_2021_pub_rw/db/“ ? Essentially this error is thrown by iPHoP when it can’t find “rewish_models” in “/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/“. To save space, “add_to_db” attempts to create a symbolic link to this directory in the new db (rather than copy pasting the whole folder), however I wonder if this step failed (and Docker did not tell us ?).
If the new database looks good except the symbolic link is broken / not there, you can copy the whole rewish_models folder from Sept_2021_pub_rw/db/ into Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/ , and the error should disappear (then we have to cross our fingers that this was the only issue with “add_to_db”..)
Let me know how this goes !
Best,
Simon
-
reporter Many thanks for your prompt response. I cannot perform a copy operation since the error message tells me that they are the same files. Below is how my database directories look. I tried copying wish_data from the original db to the new MAG-added-db since it was missing from the latter but while attempting to run prediction on Docker the error “There seems to be a database incompatibility: this is iPHoP v1.3.1, but the database provided seems to be from an older version. Please update your iPHoP database (the database name should end with '_rw')” still comes up.
antonycp@kw13:/data/iphop_db/Sept_2021_pub_rw/db$ ls -ltr total 2365056 -rwxr-xr-x 1 antonycp g-antonycp 20480 Mar 31 22:15 All_CRISPR_spacers_nr_clean.ndb -rwxr-xr-x 1 antonycp g-antonycp 112267777 Mar 31 22:15 All_CRISPR_spacers_nr_clean.fna -rwxr-xr-x 1 antonycp g-antonycp 5592524 Mar 31 22:15 All_CRISPR_spacers_nr_clean.nto -rwxr-xr-x 1 antonycp g-antonycp 16384 Mar 31 22:15 All_CRISPR_spacers_nr_clean.ntf -rwxr-xr-x 1 antonycp g-antonycp 13186381 Mar 31 22:15 All_CRISPR_spacers_nr_clean.nsq -rwxr-xr-x 1 antonycp g-antonycp 16777568 Mar 31 22:15 All_CRISPR_spacers_nr_clean.not -rwxr-xr-x 1 antonycp g-antonycp 16777700 Mar 31 22:15 All_CRISPR_spacers_nr_clean.nin -rwxr-xr-x 1 antonycp g-antonycp 150024026 Mar 31 22:15 All_CRISPR_spacers_nr_clean.nhr -rwxr-xr-x 1 antonycp g-antonycp 1967934427 Mar 31 22:15 GTDBtkr202_and_newrepr_s2_mat.pkl drwxr-xr-x 2 antonycp g-antonycp 12288 Mar 31 22:16 Host_Genomes drwxr-xr-x 2 antonycp g-antonycp 4096 Mar 31 22:18 rafah_data drwxr-xr-x 3 antonycp g-antonycp 4096 Mar 31 22:20 wish_data drwxr-xr-x 2 antonycp g-antonycp 4096 Mar 31 22:24 rewish_models -rwxr-xr-x 1 antonycp g-antonycp 139171706 Apr 1 00:19 php_db antonycp@kw13:/data/iphop_db/Sept_2021_pub_rw/db$ ls -ltr ../../Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/ total 2057472 lrwxrwxrwx 1 antonycp g-antonycp 45 Aug 31 09:05 rafah_data -> /data/iphop_db/Sept_2021_pub_rw/db/rafah_data drwxr-xr-x 2 antonycp g-antonycp 4096 Aug 31 09:05 Host_Genomes drwxr-xr-x 5 antonycp g-antonycp 4096 Aug 31 09:05 Tmp_CRISPRs lrwxrwxrwx 1 antonycp g-antonycp 66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.nto -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.nto lrwxrwxrwx 1 antonycp g-antonycp 66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.not -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.not lrwxrwxrwx 1 antonycp g-antonycp 66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.ntf -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.ntf lrwxrwxrwx 1 antonycp g-antonycp 66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.nsq -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.nsq lrwxrwxrwx 1 antonycp g-antonycp 66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.nhr -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.nhr lrwxrwxrwx 1 antonycp g-antonycp 66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.fna -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.fna lrwxrwxrwx 1 antonycp g-antonycp 66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.ndb -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.ndb lrwxrwxrwx 1 antonycp g-antonycp 66 Aug 31 09:05 All_CRISPR_spacers_nr_clean.nin -> /data/iphop_db/Sept_2021_pub_rw/db/All_CRISPR_spacers_nr_clean.nin drwxr-xr-x 2 antonycp g-antonycp 4096 Aug 31 09:08 rewish_models_extra drwxr-xr-x 3 antonycp g-antonycp 4096 Aug 31 09:14 rewish_tmp lrwxrwxrwx 1 antonycp g-antonycp 48 Aug 31 09:14 rewish_models -> /data/iphop_db/Sept_2021_pub_rw/db/rewish_models -rwxr-xr-x 1 antonycp g-antonycp 1967613666 Aug 31 09:14 GTDBtkr202_and_newrepr_s2_mat.pkl drwxr-xr-x 2 antonycp g-antonycp 4096 Aug 31 09:14 php_models_extra -rwxr-xr-x 1 antonycp g-antonycp 139171706 Aug 31 09:14 php_db drwxr-xr-x 3 antonycp g-antonycp 4096 Sep 3 08:33 wish_data
-
repo owner Right, wish_data is not needed in the new db (it’s only used at the “add_to_db” stage), so it is not copied over. I would suggest maybe removing the current link, and copying “rewish_models” over, i.e.:
$ rm /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rewish_models $ cp -r /data/iphop_db/Sept_2021_pub_rw/db/rewish_models /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/
-
reporter Many thanks again for your response. Unfortunately, I’m going around in circles here. After executing your above command, I got a new error “FileNotFoundError: [Errno 2] No such file or directory: '/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db_infos/All_CRISPR_spacers_nr_clean.metrics.csv'“ which I got around by removing these CRISPR files from the MAG db and copying them from Sept_2021_pub_rw. Then I got the next error “No such file or directory: '/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsv'“_ which I tried to get around by performing the same rm operation to the RaFAH_ref* files at the MAG_db dir but this somehow resulted in the removal of these files from the original Sept_2021_pub_rw too (I can confirm that I did not directly execute this rm operation at Sept_2021_pub_rw). Now it seemed like I would need to build back this original database and so I deleted the current Sept_2021_pub_rw and tried to do a fresh download and I got a new error that I didn’t get when I had downloaded the db the first time around. I’m left scratching my head now
antonycp@kw13:/data$ docker run --rm -v /data/iphop_new_db:/data/iphop_new_db/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest download --db_dir /data/iphop_new_db/
Please confirm you are ready to download the iPHoP database now. [y/N]:
WARNING
iPHoP database is pretty big, and will require around ~ 350Gb of disk space and some time to download and set up (except if you are downloading the test database, in which case it's only ~ 5Gb). Are you sure you want to continue ?
################
Traceback (most recent call last):
File "/opt/conda/envs/iphop/bin/iphop", line 10, in <module>
sys.exit(cli())
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
args"func"
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/master_downloader.py", line 40, in main
response = single_yes_or_no_question("Please confirm you are ready to download the iPHoP database now. ",args["no_prompt"])
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/master_downloader.py", line 204, in single_yes_or_no_question
reply = str(input(question + choices)).lower().strip() or default_answer
EOFError: EOF when reading a lineERROR conda.cli.main_run:execute(49):
conda run iphop download --db_dir /data/iphop_new_db/
failed. (See above for error) -
reporter Update- the db download seems to be running when I added the “-n” flag at the end. I’ll post here how everything goes with the rest of the testing with the MAG added db
-
reporter I removed and re-copied RaFAH_ref* files to the MAG db and yet I get the error “No such file or directory: '/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsv'“ even though the file exists. See below log
docker run --rm -v /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw:/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/:rw -v /data/MAGincl_iphop_results:/MAGincl_iphop_results/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest predict --fa_file /MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /MAGincl_iphop_results/
Welcome to iPHoP
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
[1/1/Skip] Skipping computation of blastn against microbial genomes...
[1/3/Skip] Skipping blast parsing...
[2/1/Skip] Skipping computation of blastn against CRISPR...
[2/2/Skip] Skipping crispr parsing...
[3/1/Skip] Skipping computation of WIsH scores...
[3/2/Skip] Skipping WIsH parsing...
[4/1/Skip] Skipping computation of VHM s2 similarities...
[4/2/Skip] Skipping VHM parsing...
[5/1/Skip] Skipping computation of PHP scores...
[5/2/Skip] Skipping PHP parsing...
[6/1/Skip] Skipping RaFAH...
[6/2/Skip] Skipping RaFAH parsing...
[6.5/1/Run] Running Diamond comparison to RaFAH references...
[6.5/2/Run] Get AAI distance to RaFAH refs...
Traceback (most recent call last):
File "/opt/conda/envs/iphop/bin/iphop", line 10, in <module>
sys.exit(cli())
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
args"func"
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 95, in main
aai_to_ref.run_and_parse_aai(args)
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/aai_to_ref.py", line 30, in run_and_parse_aai
get_aai_results(faa_file,db_info,args["aai_out"],args["aai_parsed"])
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/aai_to_ref.py", line 67, in get_aai_results
with open(ref) as csvfile:
FileNotFoundError: [Errno 2] No such file or directory: '/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsv'ERROR conda.cli.main_run:execute(49):
conda run iphop predict --fa_file /MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /MAGincl_iphop_results/
failed. (See above for error)
antonycp@kw13:/data$ head -n2 /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsv
NC_000866 265
NC_000867 19
antonycp@kw13:/data$ head -n2 /data/iphop_db/Sept_2021_pub_rw/db/rafah_data/RaFAH_ref_cds.count.tsvNC_000866 265
NC_000867 19 -
repo owner I suspect there is still a symbolic link error, can you try:
$ ls -lh /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/
If there are still arrows, i.e. “rafah_data -> /…”, then the way to solve this will be:
- remove the “rafah_data” link (“rm /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/db/rafah_data”)
- copy the entire directory from the previous db (“cp -r /data/iphop_db/Sept_2021_pub_rw/db/rafah_data /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/”)
If this is what you already did, then it may be a permission issue ?
-
reporter Thanks; this made that error gave away but a new one relating to the output directory has popped up (Quick new update-I’m rerunning this now after deleting the outdir MAGincl_iphop_results from the previous runs. Perhaps the intermediate files that were created from the previous runs are messing up things here?)
antonycp@kw13:/data$ docker run --rm -v /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw:/data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/:rw -v /data/MAGincl_iphop_results:/data/MAGincl_iphop_results/:rw --user $(id -u):$(id -g) -t simroux/iphop:latest predict --fa_file /data/MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /data/MAGincl_iphop_results/
Welcome to iPHoP
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
[1/1/Skip] Skipping computation of blastn against microbial genomes...
[1/3/Skip] Skipping blast parsing...
[2/1/Skip] Skipping computation of blastn against CRISPR...
[2/2/Skip] Skipping crispr parsing...
[3/1/Skip] Skipping computation of WIsH scores...
[3/2/Skip] Skipping WIsH parsing...
[4/1/Skip] Skipping computation of VHM s2 similarities...
[4/2/Skip] Skipping VHM parsing...
[5/1/Skip] Skipping computation of PHP scores...
[5/2/Skip] Skipping PHP parsing...
[6/1/Skip] Skipping RaFAH...
[6/2/Skip] Skipping RaFAH parsing...
[6.5/1/Skip] Skipping diamond search against RaFAH refs...
[6.5/2/Skip] Skipping calculation of AAI to RaFAH refs...
[7/Skip] We already found all the expected files, we skip...
[7.5/Skip] We already found all the expected files, we skip...
[8] Running the convolution networks...
[8/1] Loading data as tensors..
[8/1.1] Getting blast-based scores..
[8/1.2/Skip] All blast-based results already here, we can skip..
[8/2.1] Getting CRISPR-based scores..
[8/2.2/Skip] All crispr-based results already here, we can skip..
[8/3.1] Getting WIsH-based scores..
[8/3.2/Skip] All WIsH-based results already here, we can skip..
[8/4.1] Getting VHM-based scores..
[8/4.2/Skip] All VHM-based results already here, we can skip..
[8/5.1] Getting PHP-based scores..
[8/5.2/Skip] All PHP-based results already here, we can skip..
TF Parameter Server distributed training not available (this is expected for the pre-build release).
[9] Running the aggregation models...
[9/1/Skip] Skipping the aggregation step because we already have the output file /data/MAGincl_iphop_results/Wdir/All_scores_iPHoP_by_instance.csv
[9/2] Combining all results (Blast, CRISPR, iPHoP, and RaFAH) in a single file: /data/MAGincl_iphop_results/Wdir/All_combined_scores.csv
Traceback (most recent call last):
File "/opt/conda/envs/iphop/bin/iphop", line 10, in <module>
sys.exit(cli())
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
args"func"
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 110, in main
runaggregatormodel.run_model(args)
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/runaggregatormodel.py", line 65, in run_model
merged = merge_all_results(args)
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/runaggregatormodel.py", line 238, in merge_all_results
rafah_results = rafah.filter_rafah(rafah_results,args)
File "/opt/conda/envs/iphop/lib/python3.8/site-packages/iphop/modules/rafah.py", line 181, in filter_rafah
with open(rafah_full_clusters) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/MAGincl_iphop_results/Wdir/rafah_out/Full_Genome_to_OG_Score_Min_Score_50-Max_evalue_1e-05_Prediction.tsv'ERROR conda.cli.main_run:execute(49):
conda run iphop predict --fa_file /data/MAGincl_iphop_results/shortlist_VIBRANT_phage_genomes.fasta --db_dir /data/iphop_db/Sept_2021_w_V4_Wetland_MAGs_pub_rw/ --out_dir /data/MAGincl_iphop_results/
failed. (See above for error)
antonycp@kw13:/data$ ls -ltr /data/MAGincl_iphop_results/Wdir/rafah_out/total 792
-rw-r--r-- 1 antonycp g-antonycp 254689 Sep 4 09:04 Full_Genomes_Prediction.fasta
-rw-r--r-- 1 antonycp g-antonycp 105867 Sep 4 09:04 Full_CDS_Prediction.gff
-rw-r--r-- 1 antonycp g-antonycp 300022 Sep 4 09:04 Full_CDS_Prediction.fna
-rw-r--r-- 1 antonycp g-antonycp 143356 Sep 4 09:04 Full_CDS_Prediction.faa -
reporter Phew! Successful finally after deleting the output dir from the previous run. Thanks a zillion again for your patience and your valuable time.
-
repo owner Thanks for persisting :-) And sorry the Docker version does not behave nicely. I will add this information to the README for other Docker users who may want to build and use a custom database.
-
reporter -
repo owner - changed status to closed
Solved
- Log in to comment