error during AlignSets

Issue #15 resolved

Former user created an issue 2015-03-27

This happened in my run during an AlignSets step, after getting through ~75% of the method. Two other nodes running the script got through it fine.

I've seen this 'Error in sibling process detected.' previously, also during the AlignSets call.

I am running this in linux on AWS EC2. I attached the core file that it put out after the error. Happy to provide more information. brianbelmont@abvitro.com

stdin: is not a tty
Error processing sequence set with ID: TCCTTGCAATTAATTC_ACTGCT.
PID 27950:  Error in sibling process detected. Cleaning up.
Process Process-9:
PID 27941:  Error in sibling process detected. Cleaning up.
PID 27946:  Error in sibling process detected. Cleaning up.
PID 27943:  Error in sibling process detected. Cleaning up.
PID 27944:  Error in sibling process detected. Cleaning up.
PID 27947:  Error in sibling process detected. Cleaning up.
PID 27948:  Error in sibling process detected. Cleaning up.
PID 27945:  Error in sibling process detected. Cleaning up.
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/ebsdata/scripts/presto//AlignSets.py", line 265, in processASQueue
    align_list = align_func(seq_list, **align_args)
  File "/ebsdata/scripts/presto//AlignSets.py", line 75, in alignSeqSet
    align = AlignIO.read(stdout_handle, 'fasta')
  File "/usr/local/lib/python2.7/dist-packages/biopython-1.64-py2.7-linux-x86_64.egg/Bio/AlignIO/__init__.py", line 427, in read
    raise ValueError("No records found in handle")
ValueError: No records found in handle
PID 27942:  Error in sibling process detected. Cleaning up.

core

Comments (20)

Jason Vander Heiden
I’ll debug and try to fix the issue Monday. Can you share the input file with me (via s3)? And the command line arguments to AlignSets?
- 2015-03-28T19:15:05+00:00
Former user Account Deleted
I have since deleted the exact file I was analyzing. I uploaded another similar one that I was processing in parallel, but did not produce the error (doubt it matters since I did just restart the analysis on the original file and it went fine the second time, so may not be file-dependent). File: https://s3.amazonaws.com/abvitro-abpair/abpair_analysis/150330_BB/150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq

call was: /usr/bin/time -o $RUNTIME -a -f '%C\t%E\t%P\t%Mkb' nice AlignSets.py muscle -s 150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq --exec $MUSCLE_PATH --bf DB_MB --nproc 8
- 2015-03-30T14:50:58+00:00
Jason Vander Heiden
Thanks. If it's not a reliable error it may take be a bit to fix, as I need to reproduce it. I'll start some tests and get back to you when I track it down.
- 2015-03-30T17:25:41+00:00
Former user Account Deleted
Luckily, since it is not a reliable error, there is always an option of just rerunning the exact same script and it'll likely get through. So no immediate fix is needed, but mainly wanted to bring it to your attention. I can still keep you posted if/when this happens again.
- 2015-03-30T17:36:05+00:00
Former user Account Deleted
I've seen it a number of times too over the last few months, but never reproducibly.
- 2015-03-31T14:34:01+00:00
Jason Vander Heiden
My suspicious is that it's some sort of pipe timing issue with EC2, as what appears to be happening is that the output is from MUSCLE is empty. I'll test though.
- 2015-03-31T16:19:45+00:00

David Koppstein

I'm also getting this error. Mine is during AssemblePairs, but the issue still seems to be when using muscle.

+ /usr/bin/time -o Runtime.log -a -f ''\''%C\t%E\t%P\t%Mkb'\''' nice AssemblePairs.py reference --exec /usr/bin/muscle --maxhits 100 --minident 0.5 --evalue 1e-5 -1 141124AbV_D14-8159_R2_sequence_subsampled_10000_fusionprimers-pass_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq -2 141124AbV_D14-8159_R1_sequence_subsampled_10000_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq --1f CONSCOUNT --2f CONSCOUNT PRCONS -r /home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta --log AssemblePairs-reference.log --nproc 2 --failed
Error processing sequence with ID: ATTTTCAGATGTCT_GTGTTG.
Process Process-3:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
PID 20432:  Error in sibling process detected. Cleaning up.
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue
PID 20428:  Error in sibling process detected. Cleaning up.
    result = process_func(data, **process_args)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly
    stitch = assemble_func(head_seq, tail_seq, **assemble_args)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly
    usearch_exec=usearch_exec)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment
    stdout_str = check_output(cmd, stderr=STDOUT, shell=False)
  File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmp9p5Sj8', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpjMGMZP', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1
Error processing sequence with ID: GGACTATAGGTAACTAA_TGATAT.
Process Process-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue
    result = process_func(data, **process_args)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly
    stitch = assemble_func(head_seq, tail_seq, **assemble_args)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly
    usearch_exec=usearch_exec)
  File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment
    stdout_str = check_output(cmd, stderr=STDOUT, shell=False)
  File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmpWzJqgu', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpKNiUSd', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1

2015-04-07T20:47:58+00:00

Jason Vander Heiden
This should be usearch instead of muscle. Can you retry with "--exec /usr/bin/usearch"? (Or wherever you have usearch installed.)
- 2015-04-07T20:54:11+00:00
David Koppstein
Whoops, I must have mistyped that somewhere. Thank you for the clarification, it runs now. I think we've still been seeing the other errors in sibling processes though, and will continue to report them when they come up. Cheers
- 2015-04-07T22:07:37+00:00
Jason Vander Heiden
Thanks. I've been using Biopython's muscle interface. I'll probably just have to write my own.
- 2015-04-08T17:52:39+00:00
Jason Vander Heiden
- assigned issue to
  
  Jason Vander Heiden
Haven't had any runs of AlignSets show the error on our cluster yet. I might just write a new muscle wrapper anyway and have y'all test that.
- 2015-04-20T19:44:52+00:00
Jason Vander Heiden
I haven't been able to reproduce the problem on my end, but I just made some changes to how muscle is called in AlignSets which (in my imagination) might help (removed the shell invocation and changed the buffering).

Let me know if you still encounter this error?
- 2015-05-11T22:44:30+00:00
Jason Vander Heiden
- changed status to resolved
Assuming no news is good news. Please reopen if the issue crops up again.
- 2015-07-31T17:32:13+00:00
David Koppstein
Hi, just wanted to say that we ran into this again recently. I think it's a memory issue in the child processes, since the affected barcodes were by far the most common in the pair-pass file. We're currently applying a band-aid by reserving a whole node for the job, we'll see if that helps. If this is the case, we probably haven't run into it recently because we've been mostly doing AbPair which downsamples prior to this step, whereas we do not do the downsampling prior to AbSeq.
- 2016-06-20T16:36:49+00:00
Jason Vander Heiden
- changed status to open
Popped up again. Reopening.
- 2016-06-20T16:39:55+00:00
David Koppstein
Just to add to my previous comment, the reason why it may have been sporadic before is because the AlignSets job may or may not have been sharing a node with another high-memory job at the time. Just speculation at this point though.
- 2016-06-20T16:42:42+00:00
Jason Vander Heiden
Hey @dkoppstein, thanks. Are you using the 32 bit or 64 bit version of muscle? I'll take a look at the memory usage, and fix anything I can within the python parts. If the memory limit is being hit within muscle, then I suspect the only solution will be to add another wrapper for CD-HIT, or swarm, or something. I'd really like to start porting bits and piece to SeqAn soon, so that might actually be the best solution if it contains a suitable algorithm.

Please keep me posted. And I'll try to look at this soon. This week and next will be a little tough though.
- 2016-06-20T16:47:53+00:00
Julian Zhou
It'd be really nice if there could be some sort of built-in checkpoint mechanisms so that if anything happens one doesn't have to start all over again. I'm at 90% done with AlignSets after almost a day, but it looks like that I'm gonna hit the wall time limit (should have set that to be longer too..) and would have to start all over again :/
- 2017-02-03T15:45:16+00:00
Jason Vander Heiden
- changed status to resolved
As this seems to be a child process memory issue, I'm closing this in favor of #6.
- 2018-03-13T00:16:59+00:00
Armando Olivieri
Hi, I went through this error too and I resolved with MUSCLE 3.8.31 version (https://wiki.anunna.wur.nl/index.php/Muscle_3.8.31).
- 2022-02-22T11:18:30+00:00
Log in to comment

Assignee: Jason Vander Heiden

Type: bug

Priority: minor

Status: resolved

Votes: 0

Watchers: 3