TER records missing from multi-chain PDB
Message from Martin Ballaschk:
Dear Mr. Grant,
thank you for that really great bio3d package.
I noticed that the write.pdb function of bio3d generates PDB files that do not absolutely conform to the PDB format [1]. When invoking the "chainter" option, write.pdb writes out a PDB file with TER lines between chains, but they do not have a atom serial number, chain identifier, residue name or number assigned to them.
Best regards, Martin Ballaschk
[1] http://www.wwpdb.org/documentation/format33/sect9.html#TER
The TER record has the same residue name, chain identifier, sequence number and insertion code as the terminal residue. The serial number of the TER record is one number greater than the serial number of the ATOM/HETATM preceding the TER.
Comments (9)
-
reporter -
reporter Checking past emails from users also highlighted this annoying feature of write.pdb()
> p<-read.pdb("4q21") > write.pdb(p, "deleteme.pdb") ## Error in write.pdb(p, "deleteme.pdb") : ## write.pdb: please provide a 'pdb' object or numeric 'xyz' coordinates
Whereas this will work
write.pdb(p, file="deleteme.pdb")
I suggest moving file = "R.pdb" option to second in the list to address this.
-
This would be necessary update. We usually don't use segid infor or TER record, but it is better to conform to the standard PDB format (V3.3). I will check the online format carefully and correct all possible inconsistence. Moving file="R.pdb" option to the second position is also a good idea and save some time for typing words.
-
Hi, I checked the PDB format online (http://www.wwpdb.org/documentation/format33/sect9.html). It seems there is no segid field anymore. Instead, they have two other fields, element and charge:
COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------------------------------- 1 - 6 Record name "ATOM " 7 - 11 Integer serial Atom serial number. 13 - 16 Atom name Atom name. 17 Character altLoc Alternate location indicator. 18 - 20 Residue name resName Residue name. 22 Character chainID Chain identifier. 23 - 26 Integer resSeq Residue sequence number. 27 AChar iCode Code for insertion of residues. 31 - 38 Real(8.3) x Orthogonal coordinates for X in Angstroms. 39 - 46 Real(8.3) y Orthogonal coordinates for Y in Angstroms. 47 - 54 Real(8.3) z Orthogonal coordinates for Z in Angstroms. 55 - 60 Real(6.2) occupancy Occupancy. 61 - 66 Real(6.2) tempFactor Temperature factor. 77 - 78 LString(2) element Element symbol, right-justified. 79 - 80 LString(2) charge Charge on the atom.
Are we still need segid record?
-
reporter Good to know. However, I think the "segid" could still be useful and that we should have an option to write it (but have this writing turned off by default).
Do VMD and pymol still support it? That is, can we use it to label portions of structure to apply some display operation on in these packages such as distinct color or representation. Having 4 characters to play with to represent distinct portions of structure is more useful than a single character chain id.
-
reporter Looks like its column 73-76 for segid in PDB FORMAT v2.0. that now just wasted space.
-
You can use it in pymol yes (http://www.pymolwiki.org/index.php/Property_Selectors)
-
I have updated write.pdb to output segid if provided by option "segid" or present in the PDB object. We can completely remove segid from output if print.segid=FALSE. Have a test and let me know if there is any problem.
-
- changed status to resolved
- Log in to comment
Also Guido noticed that we don't include segid records in the output of write.pdb() even though we read them and store them with read.pdb().
I have a vague memory of purposefully excluding segid from write.pdb() as they can cause problems with other tools including transcomp.
I suggest we modify write.pdb() to enable it to output segid records only if present in the input pdb object. We should also add a flag to explicitly exclude these from output perhaps by default?
Let me know if you think this might cause problems with elsewhere...