KeyError: '3300006428_5_vs_RS_GCF_008727735.1'

Issue #30 resolved
zhangshizhe created an issue

Hi Simon,I found the error…

[7.5] Aggregating all results and formatting for RF...
### Welcome to iPHoP ###
write
Traceback (most recent call last):
  File "/home/yc/miniconda3/envs/iphop_env/bin/iphop", line 10, in <module>
    sys.exit(cli())
  File "/home/yc/miniconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/iphop.py", line 122, in cli
    args["func"](args)
  File "/home/yc/miniconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 96, in main
    dataprep_rf.aggregate_rf(args)
  File "/home/yc/miniconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 35, in aggregate_rf
    compute_matrices(df_blast,df_crispr,df_labels,args)
  File "/home/yc/miniconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 147, in compute_matrices
    selected_blast['Dist'] = selected_blast['Repr'].apply(lambda x: update_dist(host_pivot,x,store_dist))
  File "/home/yc/miniconda3/envs/iphop_env/lib/python3.8/site-packages/pandas/core/series.py", line 4357, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
  File "/home/yc/miniconda3/envs/iphop_env/lib/python3.8/site-packages/pandas/core/apply.py", line 1043, in apply
    return self.apply_standard()
  File "/home/yc/miniconda3/envs/iphop_env/lib/python3.8/site-packages/pandas/core/apply.py", line 1099, in apply_standard
    mapped = lib.map_infer(
  File "pandas/_libs/lib.pyx", line 2859, in pandas._libs.lib.map_infer
  File "/home/yc/miniconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 147, in <lambda>
    selected_blast['Dist'] = selected_blast['Repr'].apply(lambda x: update_dist(host_pivot,x,store_dist))
  File "/home/yc/miniconda3/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/dataprep_rf.py", line 218, in update_dist
    dist = store_dist[code]
KeyError: '3300006428_5_vs_RS_GCF_008727735.1''

I still haven't found out how to solve the problem. I look forward to your reply.


Comments (8)

  1. Simon Roux repo owner

    Good question, this seems like a potential issue with the database. Are you using a custom host database, or the default one ?

  2. zhangshizhe reporter

    Default database… that’s wired..

    I checked the previous steps and there seems to be no problem

  3. Simon Roux repo owner

    Can you check that the database downloaded correctly ? It seems like iPHoP has some issues loading the information from the trees (there should be a file named “gtdbtk.bac120.decorated.tree” in the folder “db_infos”)

  4. zhangshizhe reporter

    it’s ”iPHoP_db_Sept21.tar.gz“,and complete. By the way,I haven't had any problems using test data before…

    “gtdbtk.bac120.decorated.tree” is in the folder “db_infos”, RS_ GCF_ 008727735.1 is also included.

    Perhaps there is any way I can abandon this sequence? If the final result can be successfully exported

  5. Simon Roux repo owner

    Is 3300006428_5 also found in the file “gtdbtk.bac120.decorated.tree” ? If not, that’s probably an issue with this specific file.

    You can’t “abandon” a reference sequence, however it is possible (likely) that the issue is linked with a single specific sequence in your input file, so you may want to try to split this input files in smaller groups and see if some of these groups can finish successfully.

  6. zhangshizhe reporter

    3300006428_5 is also in the file “gtdbtk.bac120.decorated.tree“ .

    I will accept your suggestion and try to find the specific sequence , thanks!

    If ok, let's find the reason.

  7. Simon Roux repo owner

    This was an unexpected error with “database_prep_rf”, should be fixed now in 1.3.1, thanks for reporting !

  8. Log in to comment