Empty result in Host_prediction_to_genus_m90.tsv. WARNING of empty blastn result. Wdir/blastgenomes.tsv.log is also empty.

Issue #70 closed
asai note created an issue

I run iphop 1.3.2 installed using conda

But there is no content in Host_predction_to_genus_m90.tsv other than the header line, and it seems that the software WARNING blast has no hits. I checked the log file Wdir/blastgenomes.tsv.log, and it is also empty.

I further test my 20 phage genomes, only 3 of them have result in Host_predction_to_genus_m90.tsv.

The attachment is the genome fasta file I used for testing.

my command:

iphop predict --fa_file reads_and_genomes/RCIP0001.fasta --db_dir ~/software/iphop_db/Sept_2021_pub_rw/ --out_dir iphop_RCIP0001

stdout:

Welcome to iPHoP

Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
[1/1/Run] Running blastn against genomes...
[1/3/Run] Get relevant blast matches...
No hits were included in the result file after parsing the blast, this may be ok, but is still unusual, so you may want to check the blastn log (Wdir/blastgenomes.tsv.log)
[2/1/Run] Running blastn against CRISPR...
[2/2/Run] Get relevant crispr matches...
[3/1/Run] Running (recoded)WIsH...
[3/2/Run] Get relevant WIsH hits...
[4/1/Run] Running VHM s2 similarities...
[4/2/Run] Get relevant VHM hits...
[5/1/Run] Running PHP...
[5/2/Run] Get relevant PHP hits...
[6/1/Run] Running RaFAH...

[6/2/Run] Get relevant RaFAH scores...
[6.5/1/Run] Running Diamond comparison to RaFAH references...
[6.5/2/Run] Get AAI distance to RaFAH refs...
[7] Aggregating all results and formatting for TensorFlow...
[7/1] Loading all parsed data...
[7/2] Loading corresponding host taxonomy...
[7/3] Link matching genomes to representatives and filter out redundant / useless matches...
Filtering blast data
Filtering crispr data
Filtering wish data
Filtering vhm data
Filtering PHP data
[7/4] Write the matrices for TensorFlow...
Starting to built the matrices for TensorFlow
Loading trees
Processing data for virus FakeDummy_AJ421943.1
Processing data for virus RCIP0001_scaf1
[7.5] Aggregating all results and formatting for RF...
[8] Running the convolution networks...
[8/1] Loading data as tensors..
[8/1.1] Getting blast-based scores..
[8/1.2] Run blast classifier Model_blast_Conv-87 (by batch)..
Predicting confidence score for all batches of input data [====================================] 100%
[8/1.2] Run blast classifier Model_blast_RF-39 (by batch)..
TF Parameter Server distributed training not available (this is expected for the pre-build release).
[INFO kernel.cc:1153] Loading model from path
[INFO decision_forest.cc:617] Model loaded with 1000 root(s), 637446 node(s), and 15 input feature(s).
[INFO abstract_model.cc:1063] Engine "RandomForestOptPred" built
[INFO kernel.cc:1001] Use fast generic engine
[8/2.1] Getting CRISPR-based scores..
[8/2.2] Run crispr classifier Model_crispr_Conv-85 (by batch)..
Predicting confidence score for all batches of input data [====================================] 100%
[8/2.2] Run crispr classifier Model_crispr_Dense-15 (by batch)..
Predicting confidence score for all batches of input data [====================================] 100%
[8/3.1] Getting WIsH-based scores..
[8/3.2] Run wish classifier Model_wish_Conv-2 (by batch)..
Predicting confidence score for all batches of input data [ ] 0%WARNING:tensorflow:5 out of the last 11 calls to <function Model.make_predict_function.<locals>.predict_function at 0x2b7a8fa574c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
5 out of the last 11 calls to <function Model.make_predict_function.<locals>.predict_function at 0x2b7a8fa574c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
Predicting confidence score for all batches of input data [====================================] 100%
[8/3.2] Run wish classifier Model_wish_Dense-54 (by batch)..
Predicting confidence score for all batches of input data [ ] 0%WARNING:tensorflow:5 out of the last 11 calls to <function Model.make_predict_function.<locals>.predict_function at 0x2b7a8fa574c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
5 out of the last 11 calls to <function Model.make_predict_function.<locals>.predict_function at 0x2b7a8fa574c0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
Predicting confidence score for all batches of input data [====================================] 100%
[8/4.1] Getting VHM-based scores..
[8/4.2] Run vhm classifier Model_vhm_Conv-92 (by batch)..
Predicting confidence score for all batches of input data [====================================] 100%
[8/4.2] Run vhm classifier Model_vhm_Dense-97 (by batch)..
Predicting confidence score for all batches of input data [====================================] 100%
[8/5.1] Getting PHP-based scores..
[8/5.2] Run php classifier Model_php_Conv-90 (by batch)..
Predicting confidence score for all batches of input data [====================================] 100%
[8/5.2] Run php classifier Model_php_Dense-74 (by batch)..
Predicting confidence score for all batches of input data [====================================] 100%
[9] Running the aggregation models...
[9/1.1] Preparing data for aggregated score ...
[9/1.2] Run classifier for aggregated score ...
[INFO kernel.cc:1153] Loading model from path
[INFO decision_forest.cc:617] Model loaded with 500 root(s), 689556 node(s), and 30 input feature(s).
[INFO abstract_model.cc:1063] Engine "RandomForestOptPred" built
[INFO kernel.cc:1001] Use fast generic engine
[9/2] Combining all results (Blast, CRISPR, iPHoP, and RaFAH) in a single file: iphop_RCIP0001/Wdir/All_combined_scores.csv
[10/1] Preparing the detailed output...
[10/2] Preparing the iPHoP-only result file, linking viruses to individual genomes (iphop_RCIP0001/Host_prediction_to_genome_m90.csv) ...
[10/3] Preparing the combined iPHoP / RaFAH output summarized at the genus rank (iphop_RCIP0001/Host_prediction_to_genus_m90.csv) ...

!#!#!#!#!#! WARNING --- SOME UNEXPECTED EVENTS HAPPENED -- WE LIST THEM BELOW, IT COULD BE NOTHING, BUT YOU SHOULD STILL DOUBLE-CHECK #!#!#!#!#!#!#

No hits were included in the result file after parsing the blast, this may be ok, but is still unusual, so you may want to check the blastn log (Wdir/blastgenomes.tsv.log)

!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!!#!#!#!#!#!#!

content in the Detail_output_by_tool.csv :

### Top 5 hits for each virus with each method used

### Score 1 / Score 2 columns correpond respectively to:

### For blast hits => Total number of matches in hsps / Average percentage of identity across the full alignment

### For crispr hits => Number of mismatches / Length of CRISPR spacer

### For WIsH => Log-likelihood / P-value

### For VirHostMatcher => s2 similarity/ NA

### For PHP => PHP score/ NA

### For RaFAH => Confidence score / estimated FDR

### For iPHoP-RF => Confidence score / estimated FDR

Virus,Method,Host,Host taxonomy,Metric 1,Score 1,Metric 2,Score 2,Rank,Host representative
RCIP0001_scaf1,crispr,GCA_900758375.1,d__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Coriobacteriaceae;g__Collinsella;s__Collinsella sp900758375,N_mismatches,3.000,Spacer_length,28.0,1,GB_GCA_900758375.1
RCIP0001_scaf1,crispr,GCA_003660235.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Chromatiales;f__Sedimenticolaceae;g__QGON01;s__QGON01 sp003660235,N_mismatches,4.000,Spacer_length,32.0,2,GB_GCA_003660235.1
RCIP0001_scaf1,iPHoP-RF,RS_GCF_002900365.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia marmotae,Confidence_score,0.878,FDR,0.215,1,RS_GCF_002900365.1
RCIP0001_scaf1,iPHoP-RF,RS_GCF_000208585.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia sp000208585,Confidence_score,0.876,FDR,0.217,2,RS_GCF_000208585.1
RCIP0001_scaf1,iPHoP-RF,RS_GCF_000026225.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia fergusonii,Confidence_score,0.870,FDR,0.225,3,RS_GCF_000026225.1
RCIP0001_scaf1,iPHoP-RF,RS_GCF_002950215.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia flexneri,Confidence_score,0.870,FDR,0.225,3,RS_GCF_002950215.1
RCIP0001_scaf1,iPHoP-RF,RS_GCF_003697165.2,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli,Confidence_score,0.870,FDR,0.225,3,RS_GCF_003697165.2
RCIP0001_scaf1,php,GCF_001660175.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia sp001660175,Score,1452.969,NA,NA,1,RS_GCF_001660175.1
RCIP0001_scaf1,php,GCF_004211955.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia sp004211955,Score,1452.860,NA,NA,2,RS_GCF_004211955.1
RCIP0001_scaf1,php,GCF_011881725.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli_E,Score,1452.833,NA,NA,3,RS_GCF_011881725.1
RCIP0001_scaf1,php,GCF_000208585.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia sp000208585,Score,1452.601,NA,NA,4,RS_GCF_000208585.1
RCIP0001_scaf1,php,GCF_000026325.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli_D,Score,1452.585,NA,NA,5,RS_GCF_000026325.1
RCIP0001_scaf1,rafah,Klebsiella,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella,Confidence_score,0.497,FDR,0.183,1,NA
RCIP0001_scaf1,rafah,Escherichia,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia,Confidence_score,0.260,FDR,0.656,2,NA
RCIP0001_scaf1,rafah,Enterobacter,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Enterobacter,Confidence_score,0.114,FDR,0.895,3,NA
RCIP0001_scaf1,rafah,Cronobacter,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Cronobacter,Confidence_score,0.019,FDR,0.991,4,NA
RCIP0001_scaf1,rafah,Salmonella,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella,Confidence_score,0.017,FDR,0.993,5,NA
RCIP0001_scaf1,vhm,GCF_000006945.2,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella enterica,S2star_similarity,0.509,NA,NA,1,RS_GCF_000006945.2
RCIP0001_scaf1,vhm,GCF_900478215.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella houtenae,S2star_similarity,0.508,NA,NA,2,RS_GCF_900478215.1
RCIP0001_scaf1,vhm,GCF_008692785.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella diarizonae,S2star_similarity,0.507,NA,NA,3,RS_GCF_008692785.1
RCIP0001_scaf1,vhm,GCF_008692845.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella arizonae,S2star_similarity,0.505,NA,NA,4,RS_GCF_008692845.1
RCIP0001_scaf1,vhm,GCF_000252995.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Salmonella;s__Salmonella bongori,S2star_similarity,0.502,NA,NA,5,RS_GCF_000252995.1
RCIP0001_scaf1,wish,GCF_001654835.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Buttiauxella;s__Buttiauxella gaviniae,Log-likelihood,-1.374,p-value,0.035006513076317,1,RS_GCF_001654835.1
RCIP0001_scaf1,wish,GCF_004353845.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Citrobacter;s__Citrobacter freundii_E,Log-likelihood,-1.374,p-value,0.0393352262492108,2,RS_GCF_004353845.1
RCIP0001_scaf1,wish,GCF_004211955.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia sp004211955,Log-likelihood,-1.374,p-value,0.025069749094742,3,RS_GCF_004211955.1
RCIP0001_scaf1,wish,GCF_010231475.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Citrobacter;s__Citrobacter sp010231475,Log-likelihood,-1.374,p-value,0.0535983105246063,4,RS_GCF_010231475.1
RCIP0001_scaf1,wish,GCF_000026325.1,d__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Escherichia;s__Escherichia coli_D,Log-likelihood,-1.374,p-value,0.0299903444786042,5,RS_GCF_000026325.1

Comments (5)

  1. Simon Roux repo owner

    Hi,

    It seems to me like everything is working as expected, and there are just no host prediction with score over 90 (the default cutoff) for most of your input sequences. If you want, you can lower this minimum score cutoff as follows:

    iphop predict --fa_file reads_and_genomes/RCIP0001.fasta --db_dir ~/software/iphop_db/Sept_2021_pub_rw/ --out_dir iphop_RCIP0001 --min_score 75
    

    You can run it on the same output directory, and it will generate a new set of output files with a minimum score of 75, which may have predictions for more viruses. Note that at these lower scores (~ 75 to 90), I would recommend interpreting the host prediction at the host family rather than the genus rank.

    Best,

    Simon

  2. Log in to comment