Add subcommands to MaskPrimers to deal with data for which primers are unavailable

Issue #55 resolved
Jason Vander Heiden created an issue

We could use a subcommand in MaskPrimers to deal with data that do not have primer sequences, such as masking X bases from a given start position. The same mode should probably be able to extract UMIs both as part of the masking process and without any masking.

Comments (9)

  1. Roy Jiang

    This can be done by creating a primer file like this:

    >BARCODE
    NNNNNNNNNNN
    

    MaskPrimers.py score \ -s ${SEQ} -p ${PRIMERS} \ --mode cut/trim/tag/mask \ --start 2 \ --barcode \ --maxerror 0.2 #irrelevant...

    But another mode where only the primer is removed vs cut and trim (which either remove the preceding nts or both the preceding and the primer). And changing the barcode specification so that the cut out chunk is placed in the annotation field.

  2. Jason Vander Heiden reporter

    Yeah, that's how we've been doing it to date. You could also do it with any sequence using --maxerror 1.

    It's unintuitive though. You need a pointless file that can just be replaced by an --length argument.

  3. Jason Vander Heiden reporter

    I added command line arguments and a skeleton for this subcommand (extract), but didn't do any of the actual implementation. Let's take a look at it whenever you have time. See how we want to handle the task.

  4. Jason Vander Heiden reporter

    Should also accommodate this case:

    >sequence
    NNNNNNNNNNNNNNNNXXXXXXXXXXATGTCGATAGCTACGTCACTG
    
    Where N = cell barcode and X = UMI. And what you want is:
    
    >sequence|CELL=NNNNNNNNNNNNNNNN|UMI=XXXXXXXXXX
    NNNNNNNNNNNNNNNNXXXXXXXXXXATGTCGATAGCTACGTCACTG
    
  5. Jason Vander Heiden reporter

    Currently, that's how we would do it. Cleaning out my email and it's an old user request. Could be accomplished using the same --barcode approach in align/score in a single step.

  6. Jason Vander Heiden reporter

    extract mode probably needs the --revpr argument as well, so you can extract from the tail of different length sequences.

  7. Jason Vander Heiden reporter

    Testing done. No difference in the output of MaskPrimers-align and MaskPrimers-score between tip and v0.5.6.

  8. Log in to comment