`pept_neighbors` may contain invalid termini when peptide maps to more than one protein

Issue #9 new
Joshua Klein
created an issue

pept_neighbors maps exact peptide sequence to the neighboring amino acids of the last protein in the database which contains that peptide:

Marked lines: https://bitbucket.org/levitsky/identipy/src/0e13db2a691b04486c537a1015519fc207235795/identipy/utils.py?at=default&fileviewer=file-view-default#utils.py-995:996,1007,1008,1012,1013

This means that termini may be incorrect if the peptide maps to more than one protein. To fix this, the mapping would need to change from sequence --> termini to (sequence, protein_id) --> termini.

This "solution" would cause cascading failure in all output generation functions.

Comments (2)

  1. Lev Levitsky repo owner

    We have thought about this. If I am reading the pepXML schema correctly, the prev_aa and next_aa attributes are not supposed to go inside alternative_protein elements. If that's the case, I'm not sure where the additional information can be written.

    Right now the conflicts are resolved arbitrarily (with preference to fully tryptic peptides as opposed to semi-tryptic).

  2. Log in to comment