UMI is known but primers are not

Issue #85 resolved
Anton Kulaga created an issue

We are using library preparation kits from https://irepertoire.com/ , unfortunately they rejected to give us their primer sequences, so we only know UMI. The problem is that pResto assumes that we know primers and makes primers parameters mandatory, how can we overcome the issue and mask UMI without masking primers (which we simply do not know)?

Comments (3)

  1. Jason Vander Heiden

    You can skip the primer identification steps. These are largely for QC and isotype annotation. You can pull the UMI out of the sequences using MaskPrimers-extract by specifying the length (--len) and start position (--start) of the UMI. Eg:

    MaskPrimers.py extract -s in.fastq --start 0 --len 15 --pf UMI -o out.fastq
    

    Will put the first 15 bp in the field UMI. Or:

    MaskPrimers.py extract -s in.fastq --start 15 --len 25 --bf UMI --pr PRIMER --barcode -o out.fastq
    

    Will put the first 15 bp in the UMI field and the next 25 bp in the PRIMER field - for dual barcodes setups or if you want to guess what their primers are.

    For annotating the C-region without the official primers, you can just align against the C-region references:

    https://presto.readthedocs.io/en/stable/examples/primers.html

    More C-region substrings, based on the TakaraBio/ClonTech protocol, are compiled here:

    https://bitbucket.org/kleinstein/immcantation/src/master/protocols/Universal/

  2. Log in to comment