Add subcommands to MaskPrimers to deal with data for which primers are unavailable
We could use a subcommand in MaskPrimers to deal with data that do not have primer sequences, such as masking X bases from a given start position. The same mode should probably be able to extract UMIs both as part of the masking process and without any masking.
Comments (9)
-
-
reporter Yeah, that's how we've been doing it to date. You could also do it with any sequence using
--maxerror 1
.It's unintuitive though. You need a pointless file that can just be replaced by an
--length
argument. -
reporter I added command line arguments and a skeleton for this subcommand (
extract
), but didn't do any of the actual implementation. Let's take a look at it whenever you have time. See how we want to handle the task. -
reporter Should also accommodate this case:
>sequence NNNNNNNNNNNNNNNNXXXXXXXXXXATGTCGATAGCTACGTCACTG Where N = cell barcode and X = UMI. And what you want is: >sequence|CELL=NNNNNNNNNNNNNNNN|UMI=XXXXXXXXXX NNNNNNNNNNNNNNNNXXXXXXXXXXATGTCGATAGCTACGTCACTG
-
I think we'd want to do that in 2 stages.... no?
-
reporter Currently, that's how we would do it. Cleaning out my email and it's an old user request. Could be accomplished using the same
--barcode
approach in align/score in a single step. -
reporter Done in 6f327e0, but MaskPrimers needs a lot of testing now.
-
reporter extract mode probably needs the
--revpr
argument as well, so you can extract from the tail of different length sequences. -
reporter - changed status to resolved
Testing done. No difference in the output of MaskPrimers-align and MaskPrimers-score between
tip
and v0.5.6. - Log in to comment
This can be done by creating a primer file like this:
MaskPrimers.py score \ -s ${SEQ} -p ${PRIMERS} \ --mode cut/trim/tag/mask \ --start 2 \ --barcode \ --maxerror 0.2 #irrelevant...
But another mode where only the primer is removed vs cut and trim (which either remove the preceding nts or both the preceding and the primer). And changing the barcode specification so that the cut out chunk is placed in the annotation field.