BUG: read.pdb() or maybe other related functions

Issue #173 resolved
Xinqiu Yao created an issue
library(bio3d)
read.pdb("1a7l")

  Note: Accessing on-line PDB file
  HEADER    TRANSPORT                               16-MAR-98   1A7L               

 Call:  read.pdb(file = "1a7l")

   Total Models#: 1
     Total Atoms#: 8750,  XYZs#: 26250  Chains#: 3  (values: A B C)

     Protein Atoms#: 8634  (residues/Calpha atoms#: 1114)
     Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)

     Non-protein/nucleic Atoms#: 116  (residues: 51)
     Non-protein/nucleic resid values: [HOH (48), MAL (3) ]

Error in rep(pdb$helix$chain, (pdb$helix$end - pdb$helix$start + 1)) : 
  invalid 'times' argument

Note that this error only occurs for specific pdbs e.g. the one used above. And also, the bug exists since the released version 2.1.

Comments (5)

  1. Xinqiu Yao reporter

    Okay, I found that it is related to the residues with insert codes. Now it is the same problem as we discussed before (See issue): in some pdb files, residues are distinguished by a combination of not only "chain ID" and "resno" but also "insert".

    In current read.pdb(), atom records store everything and so we can do renumbering for this situation (e.g. with the new clean.pdb() function). But, SSEs are annotated by chain and resno only.

    One suggestion is: We add the 'insert' into pdb$helix and pdb$sheet. Then, clean.pdb() can do renumbering for these annotations, too.

    What do you think?

  2. Barry Grant

    I guess we need to add the 'insert' (if there) into the sse records as an extra vector. Thanks for catching this!

  3. Xinqiu Yao reporter

    Or maybe just a 'names' attributes to $start and $end vectors? Otherwise, we need two additional "insert" vectors, one for start residue and one for end.

    Another question: I found that the internal function pdb2sse() has been used in many places with almost identical form (e.g. plot.bio3d(), plot.cmap(), read.fasta.pdb(), and also the new clean.pdb()). It would be handy if we make it public and just call it at necessary places instead of copying many times. It will also help a lot for the above debugging, which is related to the SSE trim in trim.pdb().

    Does it make sense?

  4. Log in to comment