Trypsin digestion is not quite correct

Issue #29 resolved
Ian Castleden
created an issue

According to Expasy, trypsin digestion has some exceptions:

It would be good to fold the exceptions into the RE but this is beyond me....

This is possibly more correct in the _cleave code:

exc = re.compile(r'((?<=[CD])K(?=D))|((?<=C)K(?=[HY]))|((?<=C)R(?=K))|((?<=R)R(?=[HR]))')

def trypsin_exception(i, seq):
    m =, max(0, i - 2), i + 1)
    return bool(m)

[x.end() for x in re.finditer(trypsin, seq) if not trypsin_exception(x.end(),  seq)]

Comments (2)

  1. Lev Levitsky repo owner

    Hello Ian,

    in 1774532 I added a new "exception" arg to cleave which you can set to your pattern to get the desired behavior. The pattern was also added to the expasy_rules dict as 'trypsin_exception':

    In [1]: from pyteomics import parser
    In [2]: parser.cleave('PEPTIDKDRE', parser.expasy_rules['trypsin'], exception=parser.expasy_rules['trypsin_exception'])
    Out[2]: {'E', 'PEPTIDKDR'}
  2. Log in to comment