Read quality = 0 stops FilterSeq

Issue #70 resolved
Carolina Monzó created an issue

Hi,

I’m using FilterSeq on some very bad quality .fastq files, and when reads are fully failed (all N), it stops working since the quality of the whole read is 0.

Example read:

@SN863:625:H5M7YBCX3:1:1101:1036:5108 2:N:0:CGATGTTTATCT

NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

+

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Error:

$ python3.7 presto-0.5.13/bin/FilterSeq.py quality --inner -q 25 --failed --outdir ./data/fastq_trimmed/ -s ./data/fastq_raw/4256_A_run624_CGATGTTTGGGG_S4_L001_R2_001.fastq

START> FilterSeq

COMMAND> quality

FILE> 4256_A_run624_CGATGTTTGGGG_S4_L001_R2_001.fastq

INNER> True

MIN_QUAL> 25.0

NPROC> 12

PROGRESS> 11:47:42 | | 0% ( 0) 0.0 minPID 92134> Error in sibling process detected. Cleaning up.

ERROR> Error processing sequence with ID: SN863:625:H5M7YBCX3:1:1101:1036:5108.

PID 92121> Error in sibling process detected. Cleaning up.

Process Process-8:

Traceback (most recent call last):

File "/Users/CMonzo/.conda/envs/MPI/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap

self.run()

File "/Users/CMonzo/.conda/envs/MPI/lib/python3.7/multiprocessing/process.py", line 99, in run

self._target(*self._args, **self._kwargs)

File "/Users/CMonzo/.conda/envs/MPI/lib/python3.7/site-packages/presto/Multiprocessing.py", line 402, in processSeqQueue

result = process_func(data, **process_args)

File "/Users/CMonzo/.conda/envs/MPI/lib/python3.7/site-packages/presto/Sequence.py", line 1289, in filterQuality

q = sum(quals) / len(quals)

ZeroDivisionError: division by zero

Comments (4)

  1. Jason Vander Heiden

    Thanks for reporting this. We’ll take a look. This looks easy to fix.

    Until we post a fix, I suspect you can get these files to run through by first running them through FilterSeq.py missing to remove everything with a lot (all) Ns.

  2. Log in to comment