nma.pdbs() error message claraty

Issue #40 resolved
Barry Grant created an issue

I keep finding myself hunting down modified residues and trying to fix their lack of mass assignment. Unhelpful error messages make this process more annoying...

#+ download, cache=TRUE, eval=FALSE
# GPCR representatives
ids <- c("3UON_A", "4DAJ_A", "2RH1_A", "3EML_A", "3ODU_A", "1F88_A")
raw.files <- get.pdb(ids, path = "full_pdbs")
files <- pdbsplit(raw.files, ids, path = "chain_pdbs")

#+ Analysis, cache=TRUE
# Alignment
pdbs <- pdbaln(files)

# NMA
all.modes <- nma.pdbs(pdbs, fit=TRUE)
 Building Hessian...        Done in 1.511 seconds.
 Diagonalizing Hessian...   Done in 1.03 seconds.
 Building Hessian...        Done in 1.266 seconds.
 Diagonalizing Hessian...   Done in 0.938 seconds.
## Error in aa2mass(pdb = list(atom = c("1", "2", "3", "4", "5", "6", "7",  :
##  Unknown residue type: UNK

I am assuming nma.pdbs() failed due to issues with the third input structure but I would like to know more details so I can fix this. To test I can look at the 3rd file and try to run nma(), which actually gives me more info...

files[3]
## [1] "chain_pdbs/2RH1_A.pdb"

 n<-nma(read.pdb(files[3]))
## Error in aa2mass(pdb = c("ASP", "GLU", "VAL", "TRP", "VAL", "VAL", "GLY",  :
##  Unknown residue type: PLM> files[3]

So my question is why cant I get this info from nma.pdbs() ? And also should we keep adding to the w and aa vectors within aa2mass() as we find these things?

aa321() has followed this additive approach with these guys now in there and being mapped to their corresponding standard residues: "SEP", "TPO", "MLY", "MSE", "IAS", "ABA", "CSO", "CSD", "CYM", "CME", "CSX", "CMT", "CYX", "HIE", "HIP", "HID", "HSD", "HSE", "HSP", "DDE", "MHO", "ASX", "CIR", "PFF"

We could do the same in aa2mass and just issue a warning about the mapping...

Comments (11)

  1. Barry Grant reporter

    ... once I update (properly) and run pdbsplit again I no longer see this ...

    > n=nma(read.pdb(files[3]))
     Building Hessian...        Done in 0.513 seconds.
     Diagonalizing Hessian...   Done in 4.401 seconds.
    

    There may still be a question/proposal above.

  2. Lars Skjærven

    I will see what I can do with error messages on these unknown residues.

    We should absolutely add more masses to aa2mass. I've meaning to do so.. but we should be consistent and calculate the residue masses with hydrogens.

  3. Barry Grant reporter

    Good, I will document these residues as I come across them. Obviously we can start with the entries in aa321() listed above. I am not sure if it is helpful but the hic-up site has lots of info on ideal coordinates (including hydrogens) for these residues.

    e.g. http://xray.bmc.uu.se/hicup/SEP/

    Just replace the last three letters in the above URL with the resname you want to look up. Form the hic-up page you can link to various sites that list molecular weight in Daltons but perhaps some other info there can be useful...

  4. Barry Grant reporter

    BTW: I was attempting to reproduce the analysis described here:

    Elastic network normal mode dynamics reveal the GPCR activation mechanism. http://www.ncbi.nlm.nih.gov/pubmed/24123518?dopt=Abstract

    I thought this could be easily done with nma.pdbs(). However, their sequences in Table S1 appear to be different to those we extract with the workflow in the first post above...

  5. Lars Skjærven

    Added a few extra non-standard residues to aa2mass. The hicup was useful, but not out of of the box. Still need to go through each atom so to say. e.g. because some of them are amino acids not part of a peptide. Hydrogen atoms also needed to be added manually before calculating the mass. Related to this, the two vectors in aa2mass is messy, and perhaps we should transfer it to a better data structure.

    Error messages in nma.pdbs related to unknown residues: This is a problem since function pdbaln returns residue name X for stuff it does not know. Inside nma.pdbs I use aa123, and the X is then translated to UNK. Thus, even providing mass.custom, or the updated masses in aa2mass will not solve this problem. I've added a workaround for this issue. It involves re-reading the PDB files, and thus not re-building them purely based on the $ali and $xyz records of the pdbs object. I have also added a better check and error messaging. Let me know what you think.

  6. Barry Grant reporter

    The two vector approach in aa123(), aa321() and aa2mass() is a little messy. Should we consider returning to a template file with all this information?

    I see the issue with X and UNK in nma.pdbs(). Should we consider adding a 3-letter code containing matrix to "3dalign" class pdbs objects? Do have a feel for how often the re-reading of PDB files will be needed as currently implemented?

  7. Lars Skjærven

    I think it's ok enough for the smallish vectors in aa123/321, but it gets very messy in aa2mass as I also noted yesterday. However, merging all this info in one data file would be good I guess.

    Also agree on the 3-letter code in 3dalign! I wasn't sure about changing that data structure / object too much, so I did this workaround.

  8. Log in to comment