error during AlignSets

Issue #15 resolved
Former user created an issue

This happened in my run during an AlignSets step, after getting through ~75% of the method. Two other nodes running the script got through it fine.

I've seen this 'Error in sibling process detected.' previously, also during the AlignSets call.

I am running this in linux on AWS EC2. I attached the core file that it put out after the error. Happy to provide more information. brianbelmont@abvitro.com

stdin: is not a tty
Error processing sequence set with ID: TCCTTGCAATTAATTC_ACTGCT.
PID 27950:  Error in sibling process detected. Cleaning up.
Process Process-9:
PID 27941:  Error in sibling process detected. Cleaning up.
PID 27946:  Error in sibling process detected. Cleaning up.
PID 27943:  Error in sibling process detected. Cleaning up.
PID 27944:  Error in sibling process detected. Cleaning up.
PID 27947:  Error in sibling process detected. Cleaning up.
PID 27948:  Error in sibling process detected. Cleaning up.
PID 27945:  Error in sibling process detected. Cleaning up.
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/ebsdata/scripts/presto//AlignSets.py", line 265, in processASQueue
    align_list = align_func(seq_list, **align_args)
  File "/ebsdata/scripts/presto//AlignSets.py", line 75, in alignSeqSet
    align = AlignIO.read(stdout_handle, 'fasta')
  File "/usr/local/lib/python2.7/dist-packages/biopython-1.64-py2.7-linux-x86_64.egg/Bio/AlignIO/__init__.py", line 427, in read
    raise ValueError("No records found in handle")
ValueError: No records found in handle
PID 27942:  Error in sibling process detected. Cleaning up.

Comments (20)

  1. Jason Vander Heiden

    I’ll debug and try to fix the issue Monday. Can you share the input file with me (via s3)? And the command line arguments to AlignSets?

  2. Former user Account Deleted

    I have since deleted the exact file I was analyzing. I uploaded another similar one that I was processing in parallel, but did not produce the error (doubt it matters since I did just restart the analysis on the original file and it went fine the second time, so may not be file-dependent). File: https://s3.amazonaws.com/abvitro-abpair/abpair_analysis/150330_BB/150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq

    call was: /usr/bin/time -o $RUNTIME -a -f '%C\t%E\t%P\t%Mkb' nice AlignSets.py muscle -s 150318-1-F6_S9_L001_R2_001_fusionprimers-pass_primers-pass_pair-pass.fastq --exec $MUSCLE_PATH --bf DB_MB --nproc 8

  3. Jason Vander Heiden

    Thanks. If it's not a reliable error it may take be a bit to fix, as I need to reproduce it. I'll start some tests and get back to you when I track it down.

  4. Former user Account Deleted

    Luckily, since it is not a reliable error, there is always an option of just rerunning the exact same script and it'll likely get through. So no immediate fix is needed, but mainly wanted to bring it to your attention. I can still keep you posted if/when this happens again.

  5. Former user Account Deleted

    I've seen it a number of times too over the last few months, but never reproducibly.

  6. Jason Vander Heiden

    My suspicious is that it's some sort of pipe timing issue with EC2, as what appears to be happening is that the output is from MUSCLE is empty. I'll test though.

  7. David Koppstein

    I'm also getting this error. Mine is during AssemblePairs, but the issue still seems to be when using muscle.

    + /usr/bin/time -o Runtime.log -a -f ''\''%C\t%E\t%P\t%Mkb'\''' nice AssemblePairs.py reference --exec /usr/bin/muscle --maxhits 100 --minident 0.5 --evalue 1e-5 -1 141124AbV_D14-8159_R2_sequence_subsampled_10000_fusionprimers-pass_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq -2 141124AbV_D14-8159_R1_sequence_subsampled_10000_primers-pass_pair-pass_align-pass_consensus-pass_pair-pass_assemblealign-fail.fastq --1f CONSCOUNT --2f CONSCOUNT PRCONS -r /home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta --log AssemblePairs-reference.log --nproc 2 --failed
    Error processing sequence with ID: ATTTTCAGATGTCT_GTGTTG.
    Process Process-3:
    Traceback (most recent call last):
      File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    PID 20432:  Error in sibling process detected. Cleaning up.
        self.run()
      File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
        self._target(*self._args, **self._kwargs)
      File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue
    PID 20428:  Error in sibling process detected. Cleaning up.
        result = process_func(data, **process_args)
      File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly
        stitch = assemble_func(head_seq, tail_seq, **assemble_args)
      File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly
        usearch_exec=usearch_exec)
      File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment
        stdout_str = check_output(cmd, stderr=STDOUT, shell=False)
      File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
        raise CalledProcessError(retcode, cmd, output=output)
    CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmp9p5Sj8', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpjMGMZP', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1
    Error processing sequence with ID: GGACTATAGGTAACTAA_TGATAT.
    Process Process-2:
    Traceback (most recent call last):
      File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
        self.run()
      File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
        self._target(*self._args, **self._kwargs)
      File "/home/dkoppstein/src/bitbucket.org/javh/presto/IgCore.py", line 1253, in processSeqQueue
        result = process_func(data, **process_args)
      File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 692, in processAssembly
        stitch = assemble_func(head_seq, tail_seq, **assemble_args)
      File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 293, in referenceAssembly
        usearch_exec=usearch_exec)
      File "/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/../../javh/presto/AssemblePairs.py", line 243, in getUblastAlignment
        stdout_str = check_output(cmd, stderr=STDOUT, shell=False)
      File "/usr/lib/python2.7/subprocess.py", line 573, in check_output
        raise CalledProcessError(retcode, cmd, output=output)
    CalledProcessError: Command '['/usr/bin/muscle', '-ublast', '/tmp/95.1.all.q/tmpWzJqgu', '-db', '/home/dkoppstein/src/bitbucket.org/abvitro/abpair_pipeline/db/IMGT.IG_TR.V.human.F+ORF+infrP.ungapped.fasta', '-strand', 'plus', '-evalue', '1e-05', '-maxhits', '100', '-userout', '/tmp/95.1.all.q/tmpKNiUSd', '-userfields', 'query+target+qlo+qhi+tlo+thi+alnlen+evalue+id', '-threads', '1']' returned non-zero exit status 1
    
  8. Jason Vander Heiden

    This should be usearch instead of muscle. Can you retry with "--exec /usr/bin/usearch"? (Or wherever you have usearch installed.)

  9. David Koppstein

    Whoops, I must have mistyped that somewhere. Thank you for the clarification, it runs now. I think we've still been seeing the other errors in sibling processes though, and will continue to report them when they come up. Cheers

  10. Jason Vander Heiden

    Thanks. I've been using Biopython's muscle interface. I'll probably just have to write my own.

  11. Jason Vander Heiden

    I haven't been able to reproduce the problem on my end, but I just made some changes to how muscle is called in AlignSets which (in my imagination) might help (removed the shell invocation and changed the buffering).

    Let me know if you still encounter this error?

  12. David Koppstein

    Hi, just wanted to say that we ran into this again recently. I think it's a memory issue in the child processes, since the affected barcodes were by far the most common in the pair-pass file. We're currently applying a band-aid by reserving a whole node for the job, we'll see if that helps. If this is the case, we probably haven't run into it recently because we've been mostly doing AbPair which downsamples prior to this step, whereas we do not do the downsampling prior to AbSeq.

  13. David Koppstein

    Just to add to my previous comment, the reason why it may have been sporadic before is because the AlignSets job may or may not have been sharing a node with another high-memory job at the time. Just speculation at this point though.

  14. Jason Vander Heiden

    Hey @dkoppstein, thanks. Are you using the 32 bit or 64 bit version of muscle? I'll take a look at the memory usage, and fix anything I can within the python parts. If the memory limit is being hit within muscle, then I suspect the only solution will be to add another wrapper for CD-HIT, or swarm, or something. I'd really like to start porting bits and piece to SeqAn soon, so that might actually be the best solution if it contains a suitable algorithm.

    Please keep me posted. And I'll try to look at this soon. This week and next will be a little tough though.

  15. Julian Zhou

    It'd be really nice if there could be some sort of built-in checkpoint mechanisms so that if anything happens one doesn't have to start all over again. I'm at 90% done with AlignSets after almost a day, but it looks like that I'm gonna hit the wall time limit (should have set that to be longer too..) and would have to start all over again :/

  16. Log in to comment