Error in [7/1] Loading all parsed data...(using custom database)

Issue #62 closed
Zhongyi Cheng created an issue

Hi Simon,

Thanks for this tool. I tried to use the custom database to predict host interaction, and met the following error:

Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
[1/1/Run] Running blastn against genomes...
[1/3/Run] Get relevant blast matches...
[2/1/Run] Running blastn against CRISPR...
[2/2/Run] Get relevant crispr matches...
[3/1/Run] Running (recoded)WIsH...
### Welcome to iPHoP ###
[3/1/Run] Running WIsH extra database...
[3/2/Run] Get relevant WIsH hits...
[4/1/Run] Running VHM s2 similarities...
[4/2/Run] Get relevant VHM hits...
[5/1/Run] Running PHP...
[5/2/Run] Get relevant PHP hits...
[6/1/Run] Running RaFAH...
[6/2/Run] Get relevant RaFAH scores...
[6.5/1/Run] Running Diamond comparison to RaFAH references...
[6.5/2/Run] Get AAI distance to RaFAH refs...
[7] Aggregating all results and formatting for TensorFlow...
[7/1] Loading all parsed data...
write
Traceback (most recent call last):
  File "/public/home/zycheng/anaconda3/envs/iphop_env/bin/iphop", line 10, in <module>
    sys.exit(cli())
  File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
    args["func"](args)
  File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 102, in main
    dataprep.aggregate(args)
  File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 32, in aggregate
    load_all_signal(args,check_host,store)
  File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 425, in load_all_signal
    df_wish['P-value'] = df_wish['P-value'].apply(lambda x: round(-1 * math.log10(x))) ## Now this should be between 0 and a lot
  File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/pandas/core/series.py", line 4357, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/pandas/core/apply.py", line 1043, in apply
    return self.apply_standard()
  File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/pandas/core/apply.py", line 1098, in apply_standard
    mapped = lib.map_infer(
  File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
  File "/public/home/zycheng/anaconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep.py", line 425, in <lambda>
    df_wish['P-value'] = df_wish['P-value'].apply(lambda x: round(-1 * math.log10(x))) ## Now this should be between 0 and a lot
ValueError: cannot convert float NaN to integer

However, I tested the tool using standard database (Sept_2021_pub_rw_w_Wetland_hosts) and test viral contigs (Input_viral_contigs.fasta) you provided, and there is no error.

By the way, the version of gtdbtk I used is 1.7.0 and iphop version is 1.3.0. I'm sure there's a problem with add_to_db because that I use the standard database(Sept_2021_pub_rw_w_Wetland_hosts) and my own viral contigs and it worked! I add the MAGs to standard host database (Sept_2021_pub_rw) using the following commands:

gtdbtk de_novo_wf --genome_dir ../../binning_analysis/bac_MAGs_seq/ \
                  --bacteria \
                  --outgroup_taxon p__Patescibacteria \
                  --out_dir MAGs_GTDB-tk_results/ \
                  --cpus 64 \
                  --force \
                  --extension fa

gtdbtk de_novo_wf --genome_dir ../../binning_analysis/bac_MAGs_seq/ \
                  --archaea \
                  --outgroup_taxon p__Altarchaeota \
                  --out_dir MAGs_GTDB-tk_results/ \
                  --cpus 64 \
                  --force \
                  --extension fa

iphop add_to_db --fna_dir ../../binning_analysis/bac_MAGs_seq/ \
               --gtdb_dir MAGs_GTDB-tk_results/ \
               --out_dir Sept_2021_pub_rw_w_soybean_hosts \
               --db_dir /public/zycheng/database/virus.db/iphop_db/Sept_2021_pub_rw/

iphop predict --fa_file ../checkv_contigs.fa \
              --db_dir Sept_2021_pub_rw_w_soybean_hosts/ \
              --out_dir iphop_output \
              -t 64

Thanks for your help!

Best,

Zhongyi

Comments (5)

  1. Zhongyi Cheng reporter

    Hi Simon,

    I checked the output files of my custom database (Sept_2021_pub_rw_w_soybean_hosts) and found that there are blanks in db_infos/Wish_extra_negFits.csv:

    .....
    CK6_bin.1   -1.334523116571437  0.021246843420042415
    CK6_bin.2   -1.3248652118280526 0.03314422639684136
    CK6_bin.3   -1.3493631022877723 0.0276606656445851
    CK6_bin.4   -1.300081673918167  
    CK6_bin.5   -1.3012140987905645 
    CK6_bin.6   -1.3966279567346538 0.018745497949156517
    CK6_bin.9   -1.3563004741599325 0.02625312807760934
    High1_bin.1 -1.3247041130195305 0.02129414309886058
    High1_bin.2 -1.4293457272475523 0.021370103825002994
    .....
    

    Maybe it is the reason? I wonder that why this happened and how to fix it? Thanks in advance!

  2. Simon Roux repo owner

    Hi Zhongyi,

    Yes, you guess it right, something weird happened and some of your bin did not get a value in the “std” column. One thing you can do just to make sure that this is what is causing the issue is to manually modify the file (e.g. put “0.020000” which is a typical value in this column). If that works, then I need to figure out why these bins did get a likelihood but no std.

    Thanks,

    Best,

    Simon

  3. Simon Roux repo owner

    Oh that makes sense, sorry I did not notice you were running iPHoP 1.3.0. There was a bug that was fixed from 1.3.1 to 1.3.2, so everything makes sense :-)

    Best,

    Simon

  4. Log in to comment