KeyError: 'N match'
[7.5] Aggregating all results and formatting for RF...
Traceback (most recent call last):
File "/home/zhichao/miniconda3/envs/ViWrap/bin/iphop", line 10, in <module>
sys.exit(cli())
File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/iphop/iphop.py", line 121, in cli
args["func"](args)
File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 96, in main
dataprep_rf.aggregate_rf(args)
File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 35, in aggregate_rf
compute_matrices(df_blast,df_crispr,df_labels,args)
File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 110, in compute_matrices
tmp = tmp.groupby(['Virus','Repr','N match']).first().reset_index()
File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/pandas/core/frame.py", line 7626, in groupby
return DataFrameGroupBy(
File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 888, in __init__
grouper, exclusions, obj = get_grouper(
File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 860, in get_grouper
raise KeyError(gpr)
KeyError: 'N match'
(ViWrap) zhichao@sulfur:/storage1/data11/ViWrap$
Hi iphop authors,
What is the error in line 19? My phage bin fasta file is attached.
Comments (6)
-
repo owner -
reporter Hi,
Many thanks! I tried this script for a single virus genome. I can concatenate all my virus (several thousands), and this just avoids 0 blast hits probably. Best!
-
reporter - changed status to closed
-
Hi iphop developers,
Thanks for the great tool and easy installation. I’m running iphop for the first time on results I received using the Sullivan Lab VirSorter2 workflow on protocols.io. Quick clarification question about this KeyError: ‘N match’ that I’m also getting:
- For input fasta files that have no blast hits, does this mean the input is not viral? Has no host in the database? Has too little input data? Could the host be Eukaryotic?
Best,
René
-
repo owner Hi !
Short answer is: “All of the above” :-)
The “N match” error means there was no blast result at all that passed iPHoP filters. It could be because the input is not viral, although that is not necessarily the case, and actually maybe a bacteria / archaeal input would be “more” likely to yield a significant blast hit against other bacteria / archaea. But technically it is possible, and it’s important to know that iPHoP assume the input is viral and does not do any check about this.
If the virus is infecting a host that is very different from anything in the database and/or for which we don’t have good representation of prophages, etc, then you could indeed end up with this type of error.
Finally, the previous point definitely applies to a virus infecting eukaryotes: I would expect these to not yield any blast hit against the iPHoP database.
That being said, it is probably worth applying the “fix” described earlier in this issue to verify if other methods (outside of blast) provide some host prediction.
Best,
Simon
-
:) Awesome! Thanks for the quick and detailed reply. Other than those bins everything ran smoothly.
Take Care,
René
- Log in to comment
Hi !
Thanks for reporting, and for linking your input file (this was very useful to figuring out what happened). It looks like an exception we do not correctly catch, i.e. if there are absolutely 0 blast hits across the whole input file, the pipeline fails. I will have to look into how to best handle this, but in the meantime, here is how you can still get the iPHoP results for your sequences:
The 5 extra genomes will not change the runtime too much, but they will avoid the error you encountered before, and you should get host predictions for your bins.
Let me know if that also works from your side !