Error in [7/1] Loading all parsed data...(using custom database)
Hi Simon,
Thanks for this tool. I tried to use the custom database to predict host interaction, and met the following error:
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
[1/1/Run] Running blastn against genomes...
[1/3/Run] Get relevant blast matches...
[2/1/Run] Running blastn against CRISPR...
[2/2/Run] Get relevant crispr matches...
[3/1/Run] Running (recoded)WIsH...
### Welcome to iPHoP ###
[3/1/Run] Running WIsH extra database...
[3/2/Run] Get relevant WIsH hits...
[4/1/Run] Running VHM s2 similarities...
[4/2/Run] Get relevant VHM hits...
[5/1/Run] Running PHP...
[5/2/Run] Get relevant PHP hits...
[6/1/Run] Running RaFAH...
[6/2/Run] Get relevant RaFAH scores...
[6.5/1/Run] Running Diamond comparison to RaFAH references...
[6.5/2/Run] Get AAI distance to RaFAH refs...
[7] Aggregating all results and formatting for TensorFlow...
[7/1] Loading all parsed data...
write
Traceback (most recent call last):
File "/public/home/zycheng/anaconda3/envs/iphop_env/bin/iphop", line 10, in <module>
sys.exit(cli())
File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
args["func"](args)
File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 102, in main
dataprep.aggregate(args)
File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 32, in aggregate
load_all_signal(args,check_host,store)
File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 425, in load_all_signal
df_wish['P-value'] = df_wish['P-value'].apply(lambda x: round(-1 * math.log10(x))) ## Now this should be between 0 and a lot
File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/pandas/core/series.py", line 4357, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/pandas/core/apply.py", line 1043, in apply
return self.apply_standard()
File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/pandas/core/apply.py", line 1098, in apply_standard
mapped = lib.map_infer(
File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 425, in <lambda>
df_wish['P-value'] = df_wish['P-value'].apply(lambda x: round(-1 * math.log10(x))) ## Now this should be between 0 and a lot
ValueError: cannot convert float NaN to integer
However, I tested the tool using standard database (Sept_2021_pub_rw_w_Wetland_hosts) and test viral contigs (Input_viral_contigs.fasta) you provided, and there is no error.
By the way, the version of gtdbtk I used is 1.7.0 and iphop version is 1.3.0. I'm sure there's a problem with add_to_db
because that I use the standard database(Sept_2021_pub_rw_w_Wetland_hosts) and my own viral contigs and it worked! I add the MAGs to standard host database (Sept_2021_pub_rw) using the following commands:
gtdbtk de_novo_wf --genome_dir ../../binning_analysis/bac_MAGs_seq/ \
--bacteria \
--outgroup_taxon p__Patescibacteria \
--out_dir MAGs_GTDB-tk_results/ \
--cpus 64 \
--force \
--extension fa
gtdbtk de_novo_wf --genome_dir ../../binning_analysis/bac_MAGs_seq/ \
--archaea \
--outgroup_taxon p__Altarchaeota \
--out_dir MAGs_GTDB-tk_results/ \
--cpus 64 \
--force \
--extension fa
iphop add_to_db --fna_dir ../../binning_analysis/bac_MAGs_seq/ \
--gtdb_dir MAGs_GTDB-tk_results/ \
--out_dir Sept_2021_pub_rw_w_soybean_hosts \
--db_dir /public/zycheng/database/virus.db/iphop_db/Sept_2021_pub_rw/
iphop predict --fa_file ../checkv_contigs.fa \
--db_dir Sept_2021_pub_rw_w_soybean_hosts/ \
--out_dir iphop_output \
-t 64
Thanks for your help!
Best,
Zhongyi
Comments (5)
-
reporter -
repo owner Hi Zhongyi,
Yes, you guess it right, something weird happened and some of your bin did not get a value in the “std” column. One thing you can do just to make sure that this is what is causing the issue is to manually modify the file (e.g. put “0.020000” which is a typical value in this column). If that works, then I need to figure out why these bins did get a likelihood but no std.
Thanks,
Best,
Simon
-
reporter Hi Simon,
I have fixed this issue by updating iphop to 1.3.2. Thanks!
Best,
Zhongyi
-
repo owner Oh that makes sense, sorry I did not notice you were running iPHoP 1.3.0. There was a bug that was fixed from 1.3.1 to 1.3.2, so everything makes sense :-)
Best,
Simon
-
repo owner - changed status to closed
Solved
- Log in to comment
Hi Simon,
I checked the output files of my custom database (Sept_2021_pub_rw_w_soybean_hosts) and found that there are blanks in db_infos/Wish_extra_negFits.csv:
Maybe it is the reason? I wonder that why this happened and how to fix it? Thanks in advance!