KeyError: 'N match'

Issue #5 closed
Zhichao Zhou created an issue
[7.5] Aggregating all results and formatting for RF...
Traceback (most recent call last):
  File "/home/zhichao/miniconda3/envs/ViWrap/bin/iphop", line 10, in <module>
    sys.exit(cli())
  File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/iphop/iphop.py", line 121, in cli
    args["func"](args)
  File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 96, in main
    dataprep_rf.aggregate_rf(args)
  File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 35, in aggregate_rf
    compute_matrices(df_blast,df_crispr,df_labels,args)
  File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 110, in compute_matrices
    tmp = tmp.groupby(['Virus','Repr','N match']).first().reset_index()
  File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/pandas/core/frame.py", line 7626, in groupby
    return DataFrameGroupBy(
  File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/pandas/core/groupby/groupby.py", line 888, in __init__
    grouper, exclusions, obj = get_grouper(
  File "/home/zhichao/miniconda3/envs/ViWrap/lib/python3.8/site-packages/pandas/core/groupby/grouper.py", line 860, in get_grouper
    raise KeyError(gpr)
KeyError: 'N match'
(ViWrap) zhichao@sulfur:/storage1/data11/ViWrap$ 

Hi iphop authors,

What is the error in line 19? My phage bin fasta file is attached.

Comments (6)

  1. Simon Roux repo owner

    Hi !

    Thanks for reporting, and for linking your input file (this was very useful to figuring out what happened). It looks like an exception we do not correctly catch, i.e. if there are absolutely 0 blast hits across the whole input file, the pipeline fails. I will have to look into how to best handle this, but in the meantime, here is how you can still get the iPHoP results for your sequences:

    The 5 extra genomes will not change the runtime too much, but they will avoid the error you encountered before, and you should get host predictions for your bins.

    Let me know if that also works from your side !

  2. Zhichao Zhou reporter

    Hi,

    Many thanks! I tried this script for a single virus genome. I can concatenate all my virus (several thousands), and this just avoids 0 blast hits probably. Best!

  3. Rene Miller-Xavier

    Hi iphop developers,

    Thanks for the great tool and easy installation. I’m running iphop for the first time on results I received using the Sullivan Lab VirSorter2 workflow on protocols.io. Quick clarification question about this KeyError: ‘N match’ that I’m also getting:

    • For input fasta files that have no blast hits, does this mean the input is not viral? Has no host in the database? Has too little input data? Could the host be Eukaryotic?

    Best,

    René

  4. Simon Roux repo owner

    Hi !

    Short answer is: “All of the above” :-)

    The “N match” error means there was no blast result at all that passed iPHoP filters. It could be because the input is not viral, although that is not necessarily the case, and actually maybe a bacteria / archaeal input would be “more” likely to yield a significant blast hit against other bacteria / archaea. But technically it is possible, and it’s important to know that iPHoP assume the input is viral and does not do any check about this.

    If the virus is infecting a host that is very different from anything in the database and/or for which we don’t have good representation of prophages, etc, then you could indeed end up with this type of error.

    Finally, the previous point definitely applies to a virus infecting eukaryotes: I would expect these to not yield any blast hit against the iPHoP database.

    That being said, it is probably worth applying the “fix” described earlier in this issue to verify if other methods (outside of blast) provide some host prediction.

    Best,

    Simon

  5. Rene Miller-Xavier

    :) Awesome! Thanks for the quick and detailed reply. Other than those bins everything ran smoothly.

    Take Care,

    René

  6. Log in to comment