Writing the average structure for each cluster after hierarchical clustering
Hi,
I am using Bio3d to calculate the backbone RMSD of my protein as function of time. I have managed to do this without a problem. Then, I wanted to cluster (hierarchical) based on the RMSD matrix. I managed to this. However, when I try to follow issue #326 it shows an error message after I type the command:
gp1.ave <- colMeans(xyz[grps==1,])
Error in colMeans(xyz[grps == 1, ]) :
'x' must be an array of at least two dimensions
My code is pasted here:
> dcdfile <- system.file("examples/0FabFinal.dcd", package="bio3d")
> pdbfile <- system.file("examples/A33Fab.pdb", package="bio3d")
> dcd <- read.dcd(dcdfile)
> pdb <- read.pdb(pdbfile)
> BB.inds <- atom.select(pdb, "backbone")
> xyz <- fit.xyz(fixed=pdb$xyz, mobile=dcd,
+ fixed.inds=BB.inds$xyz,
+ mobile.inds=BB.inds$xyz)
> dim(xyz) == dim(dcd)
[1] TRUE TRUE
> ## [1] TRUE TRUE
> rd <- rmsd(xyz[1,BB.inds$xyz], xyz[,BB.inds$xyz])
> d <- dist(rd)
> hc.rd <- hclust(as.dist(d), "average")
> hclustplot(hc.rd,h=2, ylab="RMSD (Å)", main="RMSD Cluster Dendrogram")
> grps <- cutree(hc.rd, h=2)
> gp1.ave <- colMeans(xyz[grps==1,])
Any guidance would be much appreciated. I tried to troubleshoot this error message but couldn’t find anything.
Thanks,
Akash
Comments (3)
-
-
reporter Thank you for your reply. Turns out there is only 1 structure (observation) in group 1 (as shown below):
grps 1 2 3 4 5 6 7 8 9 1 630 401 72 155 56 59 111 119
How can I obtain a representative structure (.pdb) for each cluster?
I’ve tried the following but still got an error.
> gp2.ave <- colMeans(xyz[grps==2,]) > gp2.min.ind <- which.min(rmsd(gp2.ave, xyz[grps==2,])) > gp2.min.xyz <- xyz[grps==2,][gp2.min.ind,] > write.pdb(pdb=pdb, xyz=gp2.min.xyz, file="FrameRepresentative_group2.pdb") Error in write.pdb(pdb = pdb, xyz = gp2.min.xyz, file = "FrameRepresentative_group2.pdb") : write.pdb: the lengths of all input vectors != 'length(xyz)/3'
Akash
-
If only 1 member, use something like
colMeans(xyz[grps==1, , drop=FALSE])
to keep the dimension.
which.min()
might return multiple indices, which will cause an error inwrite.pdb()
. A good practice is always checking variables that may cause a problem by simply printing it in R. For example, simply typegp2.min.ind
and return and see if 1 or multiple numbers show up.
- Log in to comment
The error message seems indicate that the dimension of ‘xyz’ (or after filtering by ‘grps’) is not two. One possibility is that grps has only one member labeled as '1'. Can you check by
table(grps)
?