Rcpp read.pdb

Issue #306 resolved
Lars Skjærven created an issue

problem with coral generated pdb files

> pdb = read.pdb("test.pdb")
Error: basic_string::substr: __pos (which is 72) > this->size() (which is 66)

Old R version of read.pdb works fine

Comments (14)

  1. Xinqiu Yao

    Did you use the Rcpp version on the 'feature_shiny_report' branch and the 'test.pdb' under bio3d/inst/example? I tried but got it passed:

    > pdb = read.pdb("test.pdb")
       PDB has ALT records, taking A only, rm.alt=TRUE
    > pdb
     Call:  read.pdb(file = "bio3d/inst/examples/test.pdb")
    
       Total Models#: 1
         Total Atoms#: 175,  XYZs#: 525  Chains#: 2  (values: A B)
    
         Protein Atoms#: 103  (residues/Calpha atoms#: 7)
         Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
    
         Non-protein/nucleic Atoms#: 72  (residues: 9)
         Non-protein/nucleic resid values: [ CA (1), GDP (1), HOH (5), TES (2) ]
    
       Protein sequence:
          GGESGSK
    
    + attr: atom, xyz, calpha, call
    
  2. Lars Skjærven reporter

    No, the output pdb is from coral. A program for rigid body modelling for fitting to saxs data... I think the problem is an additional column or two

  3. Barry Grant

    What do you want us to do with this error - make an Rcpp read.pdb() run mode that can cope with nonofficial PDB format files just like the old read.pdb()? I nominate Lars for this task ;-)

  4. Lars Skjærven reporter

    it is yes. same funky pdb file format from ATSAS (for saxs based modelling) reads now both with new and old version of raed.pdb:

    > library(bio3d)
    > pdb = read.pdb("test.pdb")
    > pdb2 = read.pdb2("test.pdb")
    
    >  expect_identical(pdb$atom, pdb2$atom)
    Error: pdb$atom not identical to pdb2$atom.
    Objects equal but not identical
    
    > typeof(pdb$atom$resno)
    [1] "integer"
    > typeof(pdb2$atom$resno)
    [1] "double"
    

    Let me know if you think resno should be "double" and not "integer"... ?

    With respect to issue #327 and hexadecimal atom numbers - it's implemented in the cpp version of read.pdb but commented out. Do you think its needed?

  5. Xinqiu Yao

    Resno should be 'integer' and there is no particular reason that we keep it as 'double'...

    I think read.pdb(hex=TURE) can be useful for those users dealing with large "PDB" files generated from VMD. Note that hexdecimal eleno/resno is NOT a standard PDB format. For large official structural files, e.g. '5dat', we turn to the mmCIF format and use Lars' function read.cif(). For self-made structures, we recommend save them as Amber topology files and read them with read.prmtop().

    What do you think?

  6. Lars Skjærven reporter

    Functionality enabled in commit bf0c87a

    > pdb = read.pdb("test.pdb")
    Error in eval(substitute(expr), envir, enclos) : stringToInt("a000")
    
    > pdb = read.pdb("test.pdb", hex=TRUE)
    > range(pdb$atom$resno)
    [1]     1 46005
    > range(pdb$atom$eleno)
    [1]      1 160615
    

    We could potentially avoid that first error message with a try catch and set hex to TRUE if the try fails. Hmm...

  7. Barry Grant

    This looks good. We could have the try catch and then a warning before proceeding. Good job with these btw.

  8. Log in to comment