Suggestions for cat.pdb()
Hi,
I have some suggestions for updating the function cat.pdb() and would like to discuss with you guys.
- Currently it requires at least two pdb objects as input. Can we modify to allow one pdb input also (simply return the pdb)? This is a trivial change but could be helpful to tidy up codes in scripts in which we usually don't know how many inputs there are (e.g. in my new function pdb.biounit.R). Otherwise we need to check the number of pdb objects and so additional lines of codes.
pdb = read.pdb("1tag")
cat.pdb(pdb)
#Error in cat.pdb(pdb) : provide multiple (>1) PDB objects
- It removes SSE annotations completely. Can we work on SSE concatenation also something like a reverse operation of trim.pdb()? This will be useful if we later use the result for SSE annotation with e.g. plot.bio3d(). For example, with the new function pdb.biounit() we can build the normal hemoglobin tetramer based on a dimer pdb coordinates. The identical SSE annotation should repeat crossing new copies.
pdb <- read.pdb("1tag")
cat.pdb(pdb, pdb)$helix
#NULL
- It seems doesn't work for concatenating purely non-protein pdb objects. For example,
pdb <- read.pdb("1tag")
pdb1 <- trim.pdb(pdb, "protein")
pdb2 <- trim.pdb(pdb, "ligand")
cat.pdb(pdb1, pdb2)
#Error in xyz[i, 1] : subscript out of bounds
Is it a bug or a design? Can we change to allow concatenate any atom records?
Let me know what you think. I can also help to update it if all/some of these suggestions are got approved. Thanks a lot!
Comments (18)
-
-
reporter Thanks! Probably a call of clean.pdb() to check e.g. chain breaks, residue numbers, etc. in Q2.
-
Good points Xinqiu.
Q1: should return the same PDB object, IF arguments renumber=FALSE, rechain=FALSE. no?
Q3: Bug was related to inspect.connectivity() and is now fixed (94b09cf).
Thanks !
-
reporter Thanks for the fixing! Right, for Q1 we want it return the same PDB object if both renumber and rechain are FALSE (Should they be default value?).
-
Yes, I believe so.
-
FALSE as default sounds good
-
reporter Updated with this commit. The function returns the same PDB if provided one pdb object and can deal with SSE concatenation for multiple pdb input. The new function clean.pdb() is called to do the renumbering task. Let me know if you find any problem!
# One pdb, just return it pdb1 <- read.pdb("1etl") cat.pdb(pdb1) Call: cat.pdb(pdb1) Total Models#: 1 Total Atoms#: 160, XYZs#: 480 Chains#: 1 (values: A) Protein Atoms#: 138 (residues/Calpha atoms#: 12) Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0) Non-protein/nucleic Atoms#: 22 (residues: 14) Non-protein/nucleic resid values: [HOH (13), MPR (1) ] Protein sequence: CELCCNPGCAGC + attr: atom, helix, sheet, seqres, xyz, calpha, call, clean.log # Three, concatenate all components pdb2 <- read.pdb("1hel") new.pdb <- cat.pdb(pdb1, pdb2, pdb1, rechain=TRUE, renumber=TRUE) new.pdb Call: cat.pdb(pdb1, pdb2, pdb1, renumber = TRUE, rechain = TRUE) Total Models#: 1 Total Atoms#: 1506, XYZs#: 4518 Chains#: 3 (values: A B C) Protein Atoms#: 1277 (residues/Calpha atoms#: 153) Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0) Non-protein/nucleic Atoms#: 229 (residues: 213) Non-protein/nucleic resid values: [HOH (211), MPR (2) ] Protein sequence: CELCCNPGCAGCKVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTD GSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAW VAWRNRCKGTDVQAWIRGCRLCELCCNPGCAGC + attr: atom, helix, sheet, seqres, xyz, calpha, call, clean.log new.pdb$helix $start 4 25 79 88 104 109 120 $end 16 34 84 100 108 114 124 $chain [1] "B" "B" "B" "B" "B" "B" "B" $type [1] "5" "1" "1" "5" "1" "5" "1" "5" "5" new.pdb$clean.log [1] "pdb$calpha -> UPDATED " [2] "force.renumber = TRUE -> RENUMBERED " [3] "SSE annotation -> UPDATED " [4] "PDB is clean! "
-
hmm.. I get errors when concatenating your example pdb files:
> pdb1 <- read.pdb("1etl") > pdb2 <- read.pdb("1hel") > new.pdb <- cat.pdb(pdb1, pdb2) Error in .check.residue.ambiguity(pdb) : .check.residue.ambiguity(): Found ambiguous residue labeling Error in cat.pdb(pdb1, pdb2) : cat.pdb(): Bad format pdb generated. Try rechain=TRUE and/or renumber=TRUE > new.pdb <- cat.pdb(pdb1, pdb2, renumber=T, rechain=T) Error in rep(pdb$helix$type, (pdb$helix$end - pdb$helix$start + 1)) : invalid 'times' argument
I like the clean.log attribute, but could we call it just log? An perhaps we can use a data frame with two columns with names data and action (or something in that direction)?
-
reporter Did you update with the latest commit? I tried again and it works fine with options renumber=T and rechain=T.
Thanks for the comments on clean.log! I like your suggestions. Let's put it to To-do list. I will also be very happy if you guys can help test the clean.pdb() function with your favorite PDBs, since we could extensively use it in many other functions if it works properly.
-
I have done a 'git pull' yes. I tried on my mac as well just now and I get the same error for the code
pdb1 <- read.pdb("1etl") pdb2 <- read.pdb("1hel") new.pdb <- cat.pdb(pdb1, pdb2)
hmmm
-
ah.. this error is actually intended. perhaps we could issue a warning instead ? using rechain=T and/or renumber=T works. thanks !
I'll check out clean.pdb more thoroughly.
-
I get the same errors with this PDB (but not with rechain=T)
> pdb1 <- read.pdb("1etl") > cat.pdb(pdb1,pdb1) Error in .check.residue.ambiguity(pdb) : .check.residue.ambiguity(): Found ambiguous residue labeling Error in cat.pdb(pdb1, pdb1) : cat.pdb(): Bad format pdb generated. Try rechain=TRUE and/or renumber=TRUE > > > cat.pdb(pdb1,pdb1, rechain=T) Call: cat.pdb(pdb1, pdb1, rechain = T) Total Models#: 1 Total Atoms#: 320, XYZs#: 960 Chains#: 2 (values: A B) Protein Atoms#: 276 (residues/Calpha atoms#: 24) Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0) Non-protein/nucleic Atoms#: 44 (residues: 28) Non-protein/nucleic resid values: [HOH (26), MPR (2) ] Protein sequence: CELCCNPGCAGCCELCCNPGCAGC + attr: atom, helix, seqres, xyz, calpha, call, sheet, clean.log > > > cat.pdb(pdb1,pdb1, renumber=T) Error in .check.residue.ambiguity(pdb) : .check.residue.ambiguity(): Found ambiguous residue labeling Error in cat.pdb(pdb1, pdb1, renumber = T) : cat.pdb(): Bad format pdb generated. Try rechain=TRUE and/or renumber=TRUE
-
reporter Yes. These errors are intended because otherwise a bad formatted PDB will be generated (i.e. two different chains share the same chain IDs and so confusing on residue labeling and SSE annotation possibly). Should we still proceed but shoot warnings? If so, we could keep the original version that remove SSE but just concatenate atom and xyz records, because clean.pdb() will reject on confused residue labeling :-P
-
I think a warning and a stripping of SSE records is appropriate.
-
reporter Okay, updated with the latest commit https://bitbucket.org/Grantlab/bio3d/commits/269c62f26f7e643c3678cca323370fb7f5081712
cat.pdb(pdb1, pdb2) Call: cat.pdb(pdb1, pdb2) Total Models#: 1 Total Atoms#: 1346, XYZs#: 4038 Chains#: 1 (values: A) Protein Atoms#: 1139 (residues/Calpha atoms#: 12) Nucleic acid Atoms#: 0 (residues/phosphate atoms#: 0) Non-protein/nucleic Atoms#: 207 (residues: 199) Non-protein/nucleic resid values: [HOH (198), MPR (1) ] Protein sequence: CELCCNPGCAGCKVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTD GSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAW VAWRNRCKGTDVQAWIRGCRL + attr: atom, seqres, xyz, calpha, call Warning messages: 1: In cat.pdb(pdb1, pdb2) : cat.pdb(): Bad format pdb generated. Try rechain=TRUE and/or renumber=TRUE 2: In cat.pdb(pdb1, pdb2) : possible chain break in molecule: chain A
-
Looks good.
I would perhaps consider rechain=TRUE as default to avoid the warning. In most practical cases you would need to re-assign the chain identifier to avoid duplicate residues.
-
reporter Yes. rechain=TRUE by default makes sense. Thanks for suggesting!
-
- changed status to resolved
- Log in to comment
My reply to all those questions is 'yes' and for the last one it looks like a bug.
Q1. should return the same PDB object
Q2. we should combine SSE also but be careful about chain ids and residue numbers (perhaps by default forcing different chain ids - but hone again this might be a bad idea)
Q3. This is a bug to be squished.