Add check to AssemblePairs for reads shorter than --minlen

Issue #57 resolved
Jason Vander Heiden created an issue

From Gildas:

For some reasons, a read is shorter than --minlen, and create problems during the AssemblePairs align. The minimum length for alignment is 8, but when the read is shorter (in my case 7nt), it create an error that freeze the whole script. I guess one could just skip those sequences while printing a warning like "sequence with ID xxxxxxx is too short for processing (length 7) with option minlen=8", or something like that.

Here the error I have, and detail of the read creating problems.

       START> AssemblePairs
     COMMAND> align
       FILE1> CONS.q20.n1.e0.1.g1.f0.6-1_pair-pass.fastq
       FILE2> CONS.q20.n1.e0.1.g1.f0.6-2_pair-pass.fastq
  COORD_TYPE> illumina
       ALPHA> 1e-05
   MAX_ERROR> 0.3
     MIN_LEN> 8
     MAX_LEN> 400
SCAN_REVERSE> False
       NPROC> 16

PROGRESS> 16:40:49 [                    ]   0% (     0) 0.0 min
Error processing sequence with ID: CTACTGCA|CONSCOUNT=1|PRIMER=IgM|PRCOUNT=1.
PID 20038:  Error in sibling process detected. Cleaning up.
PID 20020:  Error in sibling process detected. Cleaning up

Checking this particular read: too short

grep CTACTGCA CONS.q20.n1.e0.1.g1.f0.6-1_pair-pass.fastq -A 3 -n
1717:@CTACTGCA|CONSCOUNT=1|PRIMER=IgM|PRCOUNT=1
1718-CCTCTGT
1719-+
1720-GGGGGGG

grep CTACTGCA CONS.q20.n1.e0.1.g1.f0.6-2_pair-pass.fastq -A 3 -n
1717:@CTACTGCA|CONSCOUNT=1|PRIMER=IgM|PRCOUNT=1
1718-CCTCTGT
1719-+
1720-GGGGGGG

The option of AssemblePairs.py align:
--minlen MIN_LEN      Minimum sequence length to scan for overlap in de novo
                        assembly. (default: 8)

Apparently, the sequence is only 7 nt, and since this is < option value, it rise an error which is not handled by any catch.

Comments (2)

  1. Log in to comment