-
assigned issue to
Huge processing times for CollapseSeq
Hi there!
I was running Presto on MiSeq run samples successfully. Lately, I’ve got a few NextSeq runs to put through the pipeline and, though most of the samples were processed without any issues, I had a few where just the CollapseSeq step took a really long time (33-62 hours).
The thing that annoyed me the most was that three samples took a similar time (33-37 hours) and one took 64 hours. I tried looking at the numbers to figure out if there was a pattern regarding which samples would take longer (as in more raw reads, longer times) but there was no pattern. Here are a few numbers I collected:
SAMPLE Running Time raw_reads contributing_reads unique_sequences unique_cdr3
Sample1 33:41:44 6,256,670 4,779,720 737,838 581,965
Sample2 34:29:56 3,418,984 2,797,692 638,508 452,911
Sample3 37:34:06 10,758,170 8,810,811 715,579 497,400
Sample4 62:36:16 3,501,513 2,783,129 885,801 691,839
The only thing that makes Sample4 outstanding is that it has more unique sequences than the others, although the difference is not proportional to the time it took to processed them.
I would really appreciate any tip or advice regarding what could be going on in here so in the future I could anticipate when this could happen or at least I could give an explanation on why it happened.
Here is some additional info:
# Presto Version: 0.6.0 (from the DockerHub immcantation/suite:4.0.0)
# Command used
CollapseSeq.py \
-s "Sample4_consensus-pass.fasta" \
-n 5 \
--uf BARCODE C_CALL \
--cf CONSCOUNT \
--act sum \
--inner \
--outname "Sample4"
Thank you very much in advance.
Cheers!
Comments (1)
-
- Log in to comment