Rcpp read.pdb

Xinqiu Yao

Did you use the Rcpp version on the 'feature_shiny_report' branch and the 'test.pdb' under bio3d/inst/example? I tried but got it passed:

> pdb = read.pdb("test.pdb")
   PDB has ALT records, taking A only, rm.alt=TRUE
> pdb
 Call:  read.pdb(file = "bio3d/inst/examples/test.pdb")

   Total Models#: 1
     Total Atoms#: 175,  XYZs#: 525  Chains#: 2  (values: A B)

     Protein Atoms#: 103  (residues/Calpha atoms#: 7)
     Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)

     Non-protein/nucleic Atoms#: 72  (residues: 9)
     Non-protein/nucleic resid values: [ CA (1), GDP (1), HOH (5), TES (2) ]

   Protein sequence:
      GGESGSK

+ attr: atom, xyz, calpha, call

2015-11-18T15:37:28+00:00

Lars Skjærven reporter

No, the output pdb is from coral. A program for rigid body modelling for fitting to saxs data... I think the problem is an additional column or two

2015-11-18T17:06:59+00:00

Barry Grant

What do you want us to do with this error - make an Rcpp read.pdb() run mode that can cope with nonofficial PDB format files just like the old read.pdb()? I nominate Lars for this task ;-)

2016-02-12T20:51:05+00:00

Lars Skjærven reporter

Absolutely:)

2016-02-13T12:29:13+00:00

Barry Grant

See also issue ~~#327~~

2016-03-14T18:59:29+00:00

Barry Grant

Is this still an open issue?

2016-09-14T00:01:28+00:00

Lars Skjærven reporter

it is yes. same funky pdb file format from ATSAS (for saxs based modelling) reads now both with new and old version of raed.pdb:

> library(bio3d)
> pdb = read.pdb("test.pdb")
> pdb2 = read.pdb2("test.pdb")

>  expect_identical(pdb$atom, pdb2$atom)
Error: pdb$atom not identical to pdb2$atom.
Objects equal but not identical

> typeof(pdb$atom$resno)
[1] "integer"
> typeof(pdb2$atom$resno)
[1] "double"

Let me know if you think resno should be "double" and not "integer"... ?

With respect to issue ~~#327~~ and hexadecimal atom numbers - it's implemented in the cpp version of read.pdb but commented out. Do you think its needed?

2016-09-14T07:06:18+00:00

Xinqiu Yao

Resno should be 'integer' and there is no particular reason that we keep it as 'double'...

I think read.pdb(hex=TURE) can be useful for those users dealing with large "PDB" files generated from VMD. Note that hexdecimal eleno/resno is NOT a standard PDB format. For large official structural files, e.g. '5dat', we turn to the mmCIF format and use Lars' function read.cif(). For self-made structures, we recommend save them as Amber topology files and read them with read.prmtop().

What do you think?

2016-09-14T21:01:08+00:00

Barry Grant

All sounds sensible to me. Thanks guys!

2016-09-14T23:13:24+00:00

Lars Skjærven reporter

Functionality enabled in commit bf0c87a

> pdb = read.pdb("test.pdb")
Error in eval(substitute(expr), envir, enclos) : stringToInt("a000")

> pdb = read.pdb("test.pdb", hex=TRUE)
> range(pdb$atom$resno)
[1]     1 46005
> range(pdb$atom$eleno)
[1]      1 160615