- edited description
Add check to AssemblePairs for reads shorter than --minlen
Issue #57
resolved
From Gildas:
For some reasons, a read is shorter than --minlen, and create problems during the AssemblePairs align. The minimum length for alignment is 8, but when the read is shorter (in my case 7nt), it create an error that freeze the whole script. I guess one could just skip those sequences while printing a warning like "sequence with ID xxxxxxx is too short for processing (length 7) with option minlen=8", or something like that.
Here the error I have, and detail of the read creating problems.
START> AssemblePairs
COMMAND> align
FILE1> CONS.q20.n1.e0.1.g1.f0.6-1_pair-pass.fastq
FILE2> CONS.q20.n1.e0.1.g1.f0.6-2_pair-pass.fastq
COORD_TYPE> illumina
ALPHA> 1e-05
MAX_ERROR> 0.3
MIN_LEN> 8
MAX_LEN> 400
SCAN_REVERSE> False
NPROC> 16
PROGRESS> 16:40:49 [ ] 0% ( 0) 0.0 min
Error processing sequence with ID: CTACTGCA|CONSCOUNT=1|PRIMER=IgM|PRCOUNT=1.
PID 20038: Error in sibling process detected. Cleaning up.
PID 20020: Error in sibling process detected. Cleaning up
Checking this particular read: too short
grep CTACTGCA CONS.q20.n1.e0.1.g1.f0.6-1_pair-pass.fastq -A 3 -n
1717:@CTACTGCA|CONSCOUNT=1|PRIMER=IgM|PRCOUNT=1
1718-CCTCTGT
1719-+
1720-GGGGGGG
grep CTACTGCA CONS.q20.n1.e0.1.g1.f0.6-2_pair-pass.fastq -A 3 -n
1717:@CTACTGCA|CONSCOUNT=1|PRIMER=IgM|PRCOUNT=1
1718-CCTCTGT
1719-+
1720-GGGGGGG
The option of AssemblePairs.py align:
--minlen MIN_LEN Minimum sequence length to scan for overlap in de novo
assembly. (default: 8)
Apparently, the sequence is only 7 nt, and since this is < option value, it rise an error which is not handled by any catch.
Comments (2)
-
reporter -
reporter - changed status to resolved
Fixed in ca6f108.
- Log in to comment