Writing the average structure for each cluster after hierarchical clustering

Issue #772 new
Akash Pandya created an issue

Hi,

I am using Bio3d to calculate the backbone RMSD of my protein as function of time. I have managed to do this without a problem. Then, I wanted to cluster (hierarchical) based on the RMSD matrix. I managed to this. However, when I try to follow issue #326 it shows an error message after I type the command:

gp1.ave <- colMeans(xyz[grps==1,])

Error in colMeans(xyz[grps == 1, ]) : 
  'x' must be an array of at least two dimensions

My code is pasted here:

> dcdfile <- system.file("examples/0FabFinal.dcd", package="bio3d")
> pdbfile <- system.file("examples/A33Fab.pdb", package="bio3d")
> dcd <- read.dcd(dcdfile)
> pdb <- read.pdb(pdbfile)
> BB.inds <- atom.select(pdb, "backbone")
> xyz <- fit.xyz(fixed=pdb$xyz, mobile=dcd,
+                fixed.inds=BB.inds$xyz,
+                mobile.inds=BB.inds$xyz)
> dim(xyz) == dim(dcd)
[1] TRUE TRUE
> ## [1] TRUE TRUE
> rd <- rmsd(xyz[1,BB.inds$xyz], xyz[,BB.inds$xyz])
> d <- dist(rd)
> hc.rd <- hclust(as.dist(d), "average")
> hclustplot(hc.rd,h=2, ylab="RMSD (Å)", main="RMSD Cluster Dendrogram")
> grps <- cutree(hc.rd, h=2)
> gp1.ave <- colMeans(xyz[grps==1,])

Any guidance would be much appreciated. I tried to troubleshoot this error message but couldn’t find anything.

Thanks,

Akash

Comments (3)

  1. Xinqiu Yao

    The error message seems indicate that the dimension of ‘xyz’ (or after filtering by ‘grps’) is not two. One possibility is that grps has only one member labeled as '1'. Can you check by table(grps)?

  2. Akash Pandya reporter

    Thank you for your reply. Turns out there is only 1 structure (observation) in group 1 (as shown below):

    grps
      1   2   3   4   5   6   7   8   9 
      1 630 401  72 155  56  59 111 119 
    

    How can I obtain a representative structure (.pdb) for each cluster?

    I’ve tried the following but still got an error.

    > gp2.ave <- colMeans(xyz[grps==2,])
    > gp2.min.ind <- which.min(rmsd(gp2.ave, xyz[grps==2,]))
    > gp2.min.xyz <- xyz[grps==2,][gp2.min.ind,]
    > write.pdb(pdb=pdb, xyz=gp2.min.xyz, file="FrameRepresentative_group2.pdb")
    
    Error in write.pdb(pdb = pdb, xyz = gp2.min.xyz, file = "FrameRepresentative_group2.pdb") : 
      write.pdb: the lengths of all input vectors != 'length(xyz)/3'
    

    Akash

  3. Xinqiu Yao

    If only 1 member, use something like colMeans(xyz[grps==1, , drop=FALSE]) to keep the dimension.

    which.min() might return multiple indices, which will cause an error in write.pdb(). A good practice is always checking variables that may cause a problem by simply printing it in R. For example, simply type gp2.min.ind and return and see if 1 or multiple numbers show up.

  4. Log in to comment