Adding MAGs to standard database - [6] Add new genomes to VHM database...
Hello,
Thanks so much for this tool.
I am adding 1765 MAGs from 37 samples to the standard database. Everything looks ok ( I appreciate the detailed instructions), but It has been running for three days at step 6 ([6] Add new genomes to VHM database...), so I would ask if that is expected.
Thank you very much.
Regards.
Comments (9)
-
repo owner -
reporter Hi. Thank you for your answer.
Finally, I stopped it. The only log file that I can see is in the rewish_tmp folder, called wish.log
__
Processing /media/ddbb/iphop_db/Sept_2021_pub_rw/db/wish_data/Decoy_db/Decoy_phages.fna -- compiling kmers and matching to hosts in 1 batches
processing batch 1
loading virus kmers for 1 to 460
Processing all host packages in /media/ddbb/iphop_db/Sept_2021_pub_rw_hotspring/db/rewish_models_extra/
Compiling individual batches results from /media/ddbb/iphop_db/Sept_2021_pub_rw_hotspring/db/rewish_tmp/wish_results into /media/ddbb/iphop_db/Sept_2021_pub_rw_hotspring/db/rewish_tmp/llikelihood.matrix
wish.log (END)__
Thinking that MAGs numbers could be the problem, I would try two groups of MAGs. Feeding the second group with the database resulting from the first group. I appreciate your feedback.
Best regards.
-
repo owner Sounds good, I think it’s definitely worth starting with a few MAGs just to make sure the pipeline works. If it does, then it’s a number issue indeed, unfortunately.
The problem with the option of adding one custom database on top of another is that it may not work (this has never been tested). I think if your test with a few MAGs work, your better options is to dedicate more threads to the “add_to_db” script (ideally run with 32 or 64 threads even), this should speed things up. -
reporter Dear Simon,
Finally I ran the script successfully adding MAGs. I changed two stuff. I shortened the paths and remove any other extension file from the working directory.
Also, I can run the standard database without problems. However, when I try to run the extended database (iphop predict --fa_file /media/oscarwd/hotspring_vmags/vrhyme_results/concatenated_vmags_vcontigs/cat_vmags_vcontigs_37mg.fasta --db_dir /media/ddbb/refineM_MAGS_hotsprings/Sept_2021_pub_rw_37mg_1.3.1/ --out_dir iphop_out_37mg_db_v1.3.1_intento2/ --num_threads 32 --debug), the process says:
___
[3/1/Run] Running WIsH extra database...
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'M65_SRR5580902_DOE_057_rm'The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/modules/wish.py", line 208, in process_batch
rewish_results = add_pvalues(rewish_results,ref_file)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/modules/wish.py", line 221, in add_pvalues
rewish_results["normalized"] = rewish_results.apply(lambda x: transform(x['LL'],x['Host'],ref_mat), axis=1)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/frame.py", line 8740, in apply
return op.apply()
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/apply.py", line 688, in apply
return self.apply_standard()
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/modules/wish.py", line 221, in <lambda>
rewish_results["normalized"] = rewish_results.apply(lambda x: transform(x['LL'],x['Host'],ref_mat), axis=1)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/modules/wish.py", line 227, in transform
ref_row = ref_mat.loc[host,['Average','Stdev']]
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/indexing.py", line 925, in getitem
return self._getitem_tuple(key)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/indexing.py", line 1100, in _getitem_tuple
return self._getitem_lowerdim(tup)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/indexing.py", line 838, in _getitem_lowerdim
section = self._getitem_axis(key, axis=i)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/indexing.py", line 1164, in _getitem_axis
return self._get_label(key, axis=axis)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/indexing.py", line 1113, in _get_label
return self.obj.xs(label, axis=axis)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/generic.py", line 3776, in xs
loc = index.get_loc(key)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'M65_SRR5580902_DOE_057_rm'
"""The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/bin/iphop", line 10, in <module>
sys.exit(cli())
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli
args"func"
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 87, in main
wish.run_and_parse_wish(args)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/modules/wish.py", line 48, in run_and_parse_wish
run_rewish(args["fasta_file"],extra_raw_results,args["wish_db_dir_extra"],extra_negfit,extra_out_tmpdir,threads_tmp)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/modules/wish.py", line 156, in run_rewish
async_parallel(process_batch, args_list, threads)
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/modules/wish.py", line 245, in async_parallel
return [r.get() for r in results]
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/site-packages/iphop/modules/wish.py", line 245, in <listcomp>
return [r.get() for r in results]
File "/home/osalgado/anaconda3/envs/iphop_1.3.1/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
KeyError: 'M65_SRR5580902_DOE_057_rm'___
I think removing that MAG could be the short solution. But Really I don't know.
I appreciate your help.
Regards.
-
repo owner Hi Oscar,
You don’t need to remove any MAG for now. This looks like the same a bug that was reported a few days ago for custom databases. I have a fix in a new version (1.3.2) that is being uploaded to bioconda right now. I will let you know as soon as it’s available for you to download and test.
Best,
Simon
-
repo owner Hi Oscar,
There is a new version on bioconda (iPHoP v1.3.2) in which this bug should be fixed. Please update your iPHoP install (“conda install iphop=1.3.2”), and rebuild your custom database (you will unfortunately need to start from scratch here, i.e. re-run the “add_to_db” part). With the new custom database built with iPHoP v1.3.2, you should not see this error anymore.
Let me know if it works !
Thanks,
Simon
-
reporter Hi Simon,
I follow your advice and everything is ok now. I obtain the gtdb files with gtdbtk-2.3.0 R214 and added the 1696 MAGs to the iPhOP database. Actually, I have results for standard and custom databases.
Thank you very much for your work here and for this great tool.
Best regards.
-
repo owner Awesome, thanks for confirming, and glad that it worked !
-
repo owner - changed status to closed
bug seems to be fixed in latest version
- Log in to comment
Hi Oscar,
1,765 MAGs is a lot to add, however 3 days also seems relatively long for the VHM step. What would be more likely is that the log is not (yet) updated, but the long database creation step is the one for WIsH ? You should certainly keep an eye on this, and copy over the log to this issue if the program never finishes.
Best,
Simon