error during AlignSets
This happened in my run during an AlignSets step, after getting through ~75% of the method. Two other nodes running the script got through it fine.
I've seen this 'Error in sibling process detected.' previously, also during the AlignSets call.
I am running this in linux on AWS EC2. I attached the core file that it put out after the error. Happy to provide more information. brianbelmont@abvitro.com
stdin: is not a tty
Error processing sequence set with ID: TCCTTGCAATTAATTC_ACTGCT.
PID 27950: Error in sibling process detected. Cleaning up.
Process Process-9:
PID 27941: Error in sibling process detected. Cleaning up.
PID 27946: Error in sibling process detected. Cleaning up.
PID 27943: Error in sibling process detected. Cleaning up.
PID 27944: Error in sibling process detected. Cleaning up.
PID 27947: Error in sibling process detected. Cleaning up.
PID 27948: Error in sibling process detected. Cleaning up.
PID 27945: Error in sibling process detected. Cleaning up.
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/ebsdata/scripts/presto//AlignSets.py", line 265, in processASQueue
align_list = align_func(seq_list, **align_args)
File "/ebsdata/scripts/presto//AlignSets.py", line 75, in alignSeqSet
align = AlignIO.read(stdout_handle, 'fasta')
File "/usr/local/lib/python2.7/dist-packages/biopython-1.64-py2.7-linux-x86_64.egg/Bio/AlignIO/__init__.py", line 427, in read
raise ValueError("No records found in handle")
ValueError: No records found in handle
PID 27942: Error in sibling process detected. Cleaning up.
Comments (20)
-
-
Account Deleted I have since deleted the exact file I was analyzing. I uploaded another similar one that I was processing in parallel, but did not produce the error (doubt it matters since I did just restart the analysis on the original file and it went fine the second time, so may not be file-dependent). File: https://s3.amazonaws.com/abvitro-abpair/abpair_analysis/150330_BB/150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq
call was: /usr/bin/time -o $RUNTIME -a -f '%C\t%E\t%P\t%Mkb' nice AlignSets.py muscle -s 150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq --exec $MUSCLE_PATH --bf DB_MB --nproc 8
-
Thanks. If it's not a reliable error it may take be a bit to fix, as I need to reproduce it. I'll start some tests and get back to you when I track it down.
-
Account Deleted Luckily, since it is not a reliable error, there is always an option of just rerunning the exact same script and it'll likely get through. So no immediate fix is needed, but mainly wanted to bring it to your attention. I can still keep you posted if/when this happens again.
-
Account Deleted I've seen it a number of times too over the last few months, but never reproducibly.
-
My suspicious is that it's some sort of pipe timing issue with EC2, as what appears to be happening is that the output is from MUSCLE is empty. I'll test though.
-
I'm also getting this error. Mine is during AssemblePairs, but the issue still seems to be when using muscle.
+ /usr/bin/time -o Runtime.log -a -f ''\''%C\t%E\t%P\t%Mkb'\''' nice AssemblePairs.py reference --exec /usr/bin/muscle --maxhits 100 --minident 0.5 --evalue 1e-5 -1 141124AbV_D14-8159_R2_sequence_subsampled_10000_fusionprimers-pass_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq -2 141124AbV_D14-8159_R1_sequence_subsampled_10000_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq --1f CONSCOUNT --2f CONSCOUNT PRCONS -r /home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta --log AssemblePairs-reference.log --nproc 2 --failed Error processing sequence with ID: ATTTTCAGATGTCT_GTGTTG. Process Process-3: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap PID 20432: Error in sibling process detected. Cleaning up. self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue PID 20428: Error in sibling process detected. Cleaning up. result = process_func(data, **process_args) File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly stitch = assemble_func(head_seq, tail_seq, **assemble_args) File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly usearch_exec=usearch_exec) File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment stdout_str = check_output(cmd, stderr=STDOUT, shell=False) File "/usr/lib/python2.7/subprocess.py", line 573, in check_output raise CalledProcessError(retcode, cmd, output=output) CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmp9p5Sj8', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpjMGMZP', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1 Error processing sequence with ID: GGACTATAGGTAACTAA_TGATAT. Process Process-2: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue result = process_func(data, **process_args) File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly stitch = assemble_func(head_seq, tail_seq, **assemble_args) File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly usearch_exec=usearch_exec) File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment stdout_str = check_output(cmd, stderr=STDOUT, shell=False) File "/usr/lib/python2.7/subprocess.py", line 573, in check_output raise CalledProcessError(retcode, cmd, output=output) CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmpWzJqgu', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpKNiUSd', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1
-
This should be usearch instead of muscle. Can you retry with "--exec /usr/bin/usearch"? (Or wherever you have usearch installed.)
-
Whoops, I must have mistyped that somewhere. Thank you for the clarification, it runs now. I think we've still been seeing the other errors in sibling processes though, and will continue to report them when they come up. Cheers
-
Thanks. I've been using Biopython's muscle interface. I'll probably just have to write my own.
-
-
assigned issue to
Haven't had any runs of AlignSets show the error on our cluster yet. I might just write a new muscle wrapper anyway and have y'all test that.
-
assigned issue to
-
I haven't been able to reproduce the problem on my end, but I just made some changes to how muscle is called in AlignSets which (in my imagination) might help (removed the shell invocation and changed the buffering).
Let me know if you still encounter this error?
-
- changed status to resolved
Assuming no news is good news. Please reopen if the issue crops up again.
-
Hi, just wanted to say that we ran into this again recently. I think it's a memory issue in the child processes, since the affected barcodes were by far the most common in the pair-pass file. We're currently applying a band-aid by reserving a whole node for the job, we'll see if that helps. If this is the case, we probably haven't run into it recently because we've been mostly doing AbPair which downsamples prior to this step, whereas we do not do the downsampling prior to AbSeq.
-
- changed status to open
Popped up again. Reopening.
-
Just to add to my previous comment, the reason why it may have been sporadic before is because the AlignSets job may or may not have been sharing a node with another high-memory job at the time. Just speculation at this point though.
-
Hey @dkoppstein, thanks. Are you using the 32 bit or 64 bit version of muscle? I'll take a look at the memory usage, and fix anything I can within the python parts. If the memory limit is being hit within muscle, then I suspect the only solution will be to add another wrapper for CD-HIT, or swarm, or something. I'd really like to start porting bits and piece to SeqAn soon, so that might actually be the best solution if it contains a suitable algorithm.
Please keep me posted. And I'll try to look at this soon. This week and next will be a little tough though.
-
It'd be really nice if there could be some sort of built-in checkpoint mechanisms so that if anything happens one doesn't have to start all over again. I'm at 90% done with AlignSets after almost a day, but it looks like that I'm gonna hit the wall time limit (should have set that to be longer too..) and would have to start all over again :/
-
- changed status to resolved
As this seems to be a child process memory issue, I'm closing this in favor of #6.
-
Hi, I went through this error too and I resolved with MUSCLE 3.8.31 version (https://wiki.anunna.wur.nl/index.php/Muscle_3.8.31).
- Log in to comment
I’ll debug and try to fix the issue Monday. Can you share the input file with me (via s3)? And the command line arguments to AlignSets?