TER records missing from multi-chain PDB

Issue #37 resolved
Barry Grant created an issue

Message from Martin Ballaschk:

Dear Mr. Grant,

thank you for that really great bio3d package.

I noticed that the write.pdb function of bio3d generates PDB files that do not absolutely conform to the PDB format [1]. When invoking the "chainter" option, write.pdb writes out a PDB file with TER lines between chains, but they do not have a atom serial number, chain identifier, residue name or number assigned to them.

Best regards, Martin Ballaschk

[1] http://www.wwpdb.org/documentation/format33/sect9.html#TER

The TER record has the same residue name, chain identifier, sequence number and insertion code as the terminal residue. The serial number of the TER record is one number greater than the serial number of the ATOM/HETATM preceding the TER.

Comments (9)

  1. Barry Grant reporter

    Also Guido noticed that we don't include segid records in the output of write.pdb() even though we read them and store them with read.pdb().

    I have a vague memory of purposefully excluding segid from write.pdb() as they can cause problems with other tools including transcomp.

    I suggest we modify write.pdb() to enable it to output segid records only if present in the input pdb object. We should also add a flag to explicitly exclude these from output perhaps by default?

    Let me know if you think this might cause problems with elsewhere...

  2. Barry Grant reporter

    Checking past emails from users also highlighted this annoying feature of write.pdb()

    > p<-read.pdb("4q21")
    > write.pdb(p, "deleteme.pdb")
    ## Error in write.pdb(p, "deleteme.pdb") :
    ## write.pdb: please provide a 'pdb' object or numeric 'xyz' coordinates
    

    Whereas this will work

    write.pdb(p, file="deleteme.pdb")
    

    I suggest moving file = "R.pdb" option to second in the list to address this.

  3. Xinqiu Yao

    This would be necessary update. We usually don't use segid infor or TER record, but it is better to conform to the standard PDB format (V3.3). I will check the online format carefully and correct all possible inconsistence. Moving file="R.pdb" option to the second position is also a good idea and save some time for typing words.

  4. Xinqiu Yao

    Hi, I checked the PDB format online (http://www.wwpdb.org/documentation/format33/sect9.html). It seems there is no segid field anymore. Instead, they have two other fields, element and charge:

    COLUMNS        DATA  TYPE    FIELD        DEFINITION
    -------------------------------------------------------------------------------------
     1 -  6        Record name   "ATOM  "
     7 - 11        Integer       serial       Atom  serial number.
    13 - 16        Atom          name         Atom name.
    17             Character     altLoc       Alternate location indicator.
    18 - 20        Residue name  resName      Residue name.
    22             Character     chainID      Chain identifier.
    23 - 26        Integer       resSeq       Residue sequence number.
    27             AChar         iCode        Code for insertion of residues.
    31 - 38        Real(8.3)     x            Orthogonal coordinates for X in Angstroms.
    39 - 46        Real(8.3)     y            Orthogonal coordinates for Y in Angstroms.
    47 - 54        Real(8.3)     z            Orthogonal coordinates for Z in Angstroms.
    55 - 60        Real(6.2)     occupancy    Occupancy.
    61 - 66        Real(6.2)     tempFactor   Temperature  factor.
    77 - 78        LString(2)    element      Element symbol, right-justified.
    79 - 80        LString(2)    charge       Charge  on the atom.
    

    Are we still need segid record?

  5. Barry Grant reporter

    Good to know. However, I think the "segid" could still be useful and that we should have an option to write it (but have this writing turned off by default).

    Do VMD and pymol still support it? That is, can we use it to label portions of structure to apply some display operation on in these packages such as distinct color or representation. Having 4 characters to play with to represent distinct portions of structure is more useful than a single character chain id.

  6. Xinqiu Yao

    I have updated write.pdb to output segid if provided by option "segid" or present in the PDB object. We can completely remove segid from output if print.segid=FALSE. Have a test and let me know if there is any problem.

  7. Log in to comment