more than 100,000 atoms notation

Issue #327 resolved
Former user created an issue

The atom ID / "eleno" (element number) filed (2nd column in a PB file) is only 5 digits long. Thus, it can accomodate 100,000 atoms using "conventional numbering" (e.g. 1, 2, 3, etc.)

But in PDB files with long molecules (e.g. 28s ribosomal RNA), if there are more than 100,00 atoms. In these cases, the PDB files resort to alpha-numeric numbering. i.e. after atom 100,000, the atom number / "eleno" continues with, for example: 186a0, 186a1,186a2, etc.

The read.pdb function detects these non-numeric "eleno" entries and throws an error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '186a0'

Comments (9)

  1. Xinqiu Yao

    Hi,

    Your question is related to this issue. The answer seems either convert the pdb file into Amber topology file and use read.prmtop() or use an alternative pdb reading function called read.pdb2() with the option hex=TRUE. Note that read.pdb2() only exists in the feature_cpp branch. See the issue for more details.

    [To developers] Btw, should we merge feature_cpp asap or leave it for a longer time testing?

  2. Lars Skjærven

    also finish implementation of read.cif. the old pdb format will anyway be obsolete in some time.

  3. Barry Grant

    We need better tests in general for the functions in this branch (as well as others). I am in favor of getting these into master once we have these tests in place.

  4. Log in to comment