Clustering ligand
Hi there,
I have few 100s of protein- ligand complexes (different conformations of a protein and multiple ligand poses for each conformation) and would like to cluster them by using different ligand RMSD values.
So, I need the protein structures to be RMSD fit first and then would like to cluster the ligand poses for different RMSD values (eg., 2-4A). Is there a way I can do this (clustering based on different rmsd as threshold) with bio3d?
Sample files attached: four complex files: two protein conformations with two ligand conformations for each. Thanks, Subha
Comments (4)
-
-
library(bio3d)
pdb<-read.pdb("all.pdb", multi=TRUE)
noh<-atom.select(pdb,'noh', value=TRUE)
noh.inds<-atom.select(pdb, 'noh')
lig<-atom.select(pdb, resid='UNK')
lig.inds<-atom.select(pdb, resid='UNK', value=TRUE)
xyz<-fit.xyz(fixed=pdb$xyz[1,], mobile=pdb, fixed.inds=noh.inds$xyz, mobile.inds=noh.inds$xyz)
rd<-rmsd(xyz[,lig$xyz])
hc<-hclust(as.dist(rd))
I don't think, I should use 'fit=TRUE' during the RMSD calculation, Did I get that correct ?
Also, I am writing the frame close to the cluster representative using
gp1.rms.ave <- colMeans(xyz[grps==1,])
gp1.rms.min.ind <- which.min(rmsd(gp1.rms.ave, xyz[grps==1,]))
gp1.rms.min.xyz <- xyz[grps==1,][gp1.rms.min.ind,]
write.pdb(pdb=pdb, xyz=gp1.rms.min.xyz, file="FrameRep_group1_byRMS.pdb")
However, is it possible to write the model number in the pdb file (or the frame no of this structure that is close to representative) ? Say, FrameRep_group1_byRMS.pdb is number 30 of my 500 models in input pdb file. Is it possible to write out this number 30?
-
Yes you are right. You don't need 'fit=TRUE' because you have fitted structures using
fit.xyz()
already. One thing I would suggest is to use 'protein atoms' as the reference for fitting, instead of 'noh.inds'. If the protein is flexible across the ensemble, it is better use a subset of the protein that is relatively rigid (e.g. as obtained from thecore.find()
function). By this, you could have a more reasonable coordinate frame in which ligand binding poses can be characterized.Your codes of finding the representative frame are a bit confusing to me. Are you looking at conformations of whole system or just ligand?
I guess bio3d doesn't support writing arbitrary model number in a pdb file. You may put the number in the file name for future tracking, e.g.
ind <- which(grps==1)[gp1.rms.min.ind] file <- paste("FrameRep_group1_byRMS_", ind, ".pdb", sep=""). write.pdb(pdb=pdb, xyz=gp1.rms.min.xyz, file=file)
-
- changed status to resolved
- Log in to comment
Hi,
It should be straightforward to do such a clustering.
First, fit protein structures using
fit.xyz()
. Then, calculate RMSD values of ligand usingrmsd()
(note that argument a should be the fitted structural coordinates and a.inds to select the ligand atoms). Third, do clustering withhclust()
(Don't forget converting the RMSD values into a distance matrix withas.dist()
).Let me know if you still have questions.