Program still running but throwing a memory Error after or during Wish run
Hi Simon!
I am currently running a ~90MB fasta through IPHoP, and have not gotten the program to complete with this dataset, as while the program is calculating distances in Step 2 (I think), it is throwing a memory error (See below). I am running the program with 180GB of data, 2 nodes (96 cores total) and still getting this error. I’ve checked my fasta file and nothing seems weird about headers or spacing, for example:
>M16W0_k127_154605_1
ACATATGGCGACGTCATCCCGGAGAACCATGAGGGCAGCGGGATGACGTTTGACGTCGAT
GCGGAAATCTTCGCTGGCAGGACACTGGTGGTGTACGAGCGGATGTACCTCGAAAATGGC
TACGGCGCAGGAAGCATCTTGTGGCGGAGCATCAGGTCCTTCTGGACGAGGACCAGACCA
This was my submission command:
sbatch --partition defq -D ./ --mem=180G --time 72:0:0 --nodes=2 --wrap 'iphop predict --fa_file vOTUs_numbered.fna --db_dir /home/hhallow1/scratch4-jsuez1/shared_databases/iphop_db/Sept_2021_pub_rw --out_dir ./iphop_number2 -t 96' -o iphop.log
And here is the full log with the error:
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves
[1/1/Run] Running blastn against genomes...
[1/3/Run] Get relevant blast matches...
[2/1/Run] Running blastn against CRISPR...
[2/2/Run] Get relevant crispr matches...
[3/1/Run] Running (recoded)WIsH...
### Welcome to iPHoP ###
Process ForkPoolWorker-46:
Traceback (most recent call last):
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/pool.py", line 114, in worker
task = get()
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/queues.py", line 358, in get
return _ForkingPickler.loads(res)
MemoryError
Process ForkPoolWorker-47:
Traceback (most recent call last):
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/pool.py", line 114, in worker
task = get()
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/queues.py", line 358, in get
return _ForkingPickler.loads(res)
MemoryError
Process ForkPoolWorker-48:
Traceback (most recent call last):
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/pool.py", line 114, in worker
task = get()
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/queues.py", line 356, in get
res = self._reader.recv_bytes()
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/connection.py", line 421, in _recv_bytes
return self._recv(size)
File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/connection.py", line 386, in _recv
buf.write(chunk)
MemoryError
The program is still running, which i thought was strange. Any insight on what is going on? This is probably my 3rd or 4th try, continually increasing memory and cores as i go. Thanks!!
Comments (14)
-
repo owner -
reporter Thanks! Will try that. I also tried splitting the fasta into 2, hoping that subsetting allows pickle enough memory to create the dictionary. I will try the --single_thread_wish if this doesnt work!
-
reporter Do you think a fresh install would help?
-
repo owner I wonder also if running on multiple nodes is possible with this library, so if not already tried, you may want to check what happens when you only run on a single node ?
-
reporter I tried a single node, with the original large file, but it was stuck on calculating distances for ~3 days and ran out of the max time i can allot on the node (72hours). I’m a little surprised as i only have about ~25k sequences in the file
-
reporter Trying a single node with the split files might be a good option, ill submit that as well and report back!
-
repo owner Right, so looking back at the memory errors you see, I’m more and more convinced they come from the job being split over multiple nodes. ~ 25k can be a bit long :-) I typically process batches of ~ 2 to 3k to make sure the job runs in a reasonable time. So I would try running smaller batches on individual nodes (without the “single_thread_wish”) and see if it fixes everything
-
reporter great suggestion! ill cancel the jobs and try that. Thank you for the help!
-
reporter Hey Simon! I had a chance to separate my fasta files into 2000 sequence chunks using this script:
from Bio import SeqIO # Define the input .fna file and the number of sequences per split file input_file = 'vOTUs_numbered.fna' sequences_per_file = 2000 # You can adjust this as needed # Initialize variables sequence_count = 0 file_counter = 1 output_file = None #Initialize the output_file variable with open(input_file, "r") as f: records = SeqIO.parse(f, "fasta") for record in records: sequence_count += 1 if sequence_count == 1 or sequence_count > sequences_per_file: # Close the previous split file and open a new one if output_file: output_file.close() output_file = open(f'vOTUs_numbered_split_{file_counter}.fna', 'w') file_counter += 1 sequence_count = 1 # Write the current sequence to the output file SeqIO.write(record, output_file, "fasta") # Close the last output file if output_file: output_file.close()
And then ran a test sample to make sure things were running smoothly. The error message I received was quite different, but also a lengthy one (sorry!). I’ve attached it below. I have a feeling this is resulting from me splitting the files perhaps? Something funky with headers? I double checked the line count to make sure it was even, and head/tailed a few to make sure that they were not being cutoff in the middle of a sequence. I also double checked the wish output and there is a column titled ‘Normalized’. Any thoughts on what might be causing this? Let me know if any additional files our outputs might be helpful here.
Looks like everything is now set up, we will first clean up the input file, and then we will start the host prediction steps themselves [1/1/Run] Running blastn against genomes... [1/3/Run] Get relevant blast matches... [2/1/Run] Running blastn against CRISPR... [2/2/Run] Get relevant crispr matches... [3/1/Run] Running (recoded)WIsH... ### Welcome to iPHoP ### multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'normalized' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/pandas/core/frame.py", line 3751, in _set_item_mgr loc = self._info_axis.get_loc(key) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc raise KeyError(key) from err KeyError: 'normalized' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/wish.py", line 214, in process_batch rewish_results = add_pvalues(rewish_results,ref_file) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/wish.py", line 227, in add_pvalues rewish_results["normalized"] = rewish_results.apply(lambda x: transform(x['LL'],x['Host'],ref_mat), axis=1) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/pandas/core/frame.py", line 3602, in __setitem__ self._set_item_frame_value(key, value) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/pandas/core/frame.py", line 3742, in _set_item_frame_value self._set_item_mgr(key, arraylike) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/pandas/core/frame.py", line 3754, in _set_item_mgr self._mgr.insert(len(self._info_axis), key, value) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1162, in insert block = new_block(values=value, ndim=self.ndim, placement=slice(loc, loc + 1)) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 1937, in new_block check_ndim(values, placement, ndim) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 1979, in check_ndim raise ValueError( ValueError: Wrong number of items passed 3, placement implies 1 """ The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/home/hhallow1/.conda/envs/iphop_env/bin/iphop", line 10, in <module> sys.exit(cli()) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/iphop.py", line 128, in cli args["func"](args) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/master_predict.py", line 87, in main File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/wish.py", line 44, in run_and_parse_wish run_rewish(args["fasta_file"],args["wishrawresult"],args["rewish_db_dir"],args["wish_negfit"],args["tmp"],threads_tmp,n_host_by_phage) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/wish.py", line 159, in run_rewish async_parallel(process_batch, args_list, threads) File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/wish.py", line 251, in async_parallel return [r.get() for r in results] File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/site-packages/iphop/modules/wish.py", line 251, in <listcomp> return [r.get() for r in results] File "/home/hhallow1/.conda/envs/iphop_env/lib/python3.8/multiprocessing/pool.py", line 771, in get raise self._value ValueError: Wrong number of items passed 3, placement implies 1
-
repo owner Hi !
Sorry, it’s a known bug we recently fixed but did not release in conda yet. It’s an easy fix though: the problem only happens when you have batches of exactly 1,000 sequences or exact multiples of 1,000 (like 2,000 :-) ) . So the fix is to use another batch size (e.g. 1,500 or 2,500), and the error should disappear.
-
reporter ha ha!! okay, sounds great. I will go ahead and regenerate the files and run. Thank you!
-
reporter Just wanted to update that this fixed the issue, thank you for all the help! Ticket can be closed
-
repo owner Great, thanks for the update !
-
repo owner - changed status to closed
Fixed
- Log in to comment
Yikes, this is not great, and unfortunately seems to be an issue in one of the underlying libraries so not the easiest to fix. Could you try running with “--single_thread_wish” (just add this option in “iphop predict …” ). That should bypass the “multiprocessing” thing, will be much slower but hopefully would complete ?
Best,
Simon