Clustering ligand

Issue #442 resolved
Former user created an issue

Hi there,

I have few 100s of protein- ligand complexes (different conformations of a protein and multiple ligand poses for each conformation) and would like to cluster them by using different ligand RMSD values.

So, I need the protein structures to be RMSD fit first and then would like to cluster the ligand poses for different RMSD values (eg., 2-4A). Is there a way I can do this (clustering based on different rmsd as threshold) with bio3d?

Sample files attached: four complex files: two protein conformations with two ligand conformations for each. Thanks, Subha

Comments (4)

  1. Xinqiu Yao

    Hi,

    It should be straightforward to do such a clustering.

    First, fit protein structures using fit.xyz(). Then, calculate RMSD values of ligand using rmsd() (note that argument a should be the fitted structural coordinates and a.inds to select the ligand atoms). Third, do clustering with hclust() (Don't forget converting the RMSD values into a distance matrix with as.dist()).

    Let me know if you still have questions.

  2. skal24

    library(bio3d)

    pdb<-read.pdb("all.pdb", multi=TRUE)

    noh<-atom.select(pdb,'noh', value=TRUE)

    noh.inds<-atom.select(pdb, 'noh')

    lig<-atom.select(pdb, resid='UNK')

    lig.inds<-atom.select(pdb, resid='UNK', value=TRUE)

    xyz<-fit.xyz(fixed=pdb$xyz[1,], mobile=pdb, fixed.inds=noh.inds$xyz, mobile.inds=noh.inds$xyz)

    rd<-rmsd(xyz[,lig$xyz])

    hc<-hclust(as.dist(rd))

    I don't think, I should use 'fit=TRUE' during the RMSD calculation, Did I get that correct ?

    Also, I am writing the frame close to the cluster representative using

    gp1.rms.ave <- colMeans(xyz[grps==1,])

    gp1.rms.min.ind <- which.min(rmsd(gp1.rms.ave, xyz[grps==1,]))

    gp1.rms.min.xyz <- xyz[grps==1,][gp1.rms.min.ind,]

    write.pdb(pdb=pdb, xyz=gp1.rms.min.xyz, file="FrameRep_group1_byRMS.pdb")

    However, is it possible to write the model number in the pdb file (or the frame no of this structure that is close to representative) ? Say, FrameRep_group1_byRMS.pdb is number 30 of my 500 models in input pdb file. Is it possible to write out this number 30?

  3. Xinqiu Yao

    Yes you are right. You don't need 'fit=TRUE' because you have fitted structures using fit.xyz() already. One thing I would suggest is to use 'protein atoms' as the reference for fitting, instead of 'noh.inds'. If the protein is flexible across the ensemble, it is better use a subset of the protein that is relatively rigid (e.g. as obtained from the core.find() function). By this, you could have a more reasonable coordinate frame in which ligand binding poses can be characterized.

    Your codes of finding the representative frame are a bit confusing to me. Are you looking at conformations of whole system or just ligand?

    I guess bio3d doesn't support writing arbitrary model number in a pdb file. You may put the number in the file name for future tracking, e.g.

    ind <- which(grps==1)[gp1.rms.min.ind]
    file <- paste("FrameRep_group1_byRMS_", ind, ".pdb", sep="").
    write.pdb(pdb=pdb, xyz=gp1.rms.min.xyz, file=file)
    
  4. Log in to comment