PCA cluster average
Hello all,
I have PCA and respective clusters. I want to extract a single representative PDB for each cluster.
I tried to replicate the solution given for RMSD (https://bitbucket.org/Grantlab/bio3d/issues/223/how-to-extract-pdb-file). However, I seem to never get the dimensions right regarding inds… Also, I have 5 clusters.
Errors:
Error in PCA_WT_H6N6[inds, inds] : incorrect number of dimensions
or, if PCA_WT_H6N6$z[inds,inds]
Error in PCA_WT_H6N6$z[inds, inds] : subscript out of bounds
My code is as follows:
find representatives
unq <- unique(grps_WT_H6N6)
all.inds <- seq(1, length(grps_WT_H6N6))
rep.inds <- rep(NA, length(unq))
for(i in 1:length(unq)) {
grp <- unq[i]
inds <- which(grps_WT_H6N6==grp)
if(length(inds) > 1) {
tmp <- PCA_WT_H6N6[inds, inds]
m <- rowMeans(tmp)
rep.ind <- which.min(m)
rep.ind <- all.inds[inds][rep.ind]
}
else {
rep.ind <- inds[1]
}
rep.inds[i] <- rep.ind
}
indices of cluser representatives
rep.inds
write.pdb(pdb, xyz=xyz[rep.inds[1], ], file="rep_clust1.pdb")
write.pdb(pdb, xyz=xyz[rep.inds[2], ], file="rep_clust2.pdb")
write.pdb(pdb, xyz=xyz[rep.inds[3], ], file="rep_clust3.pdb")
Comments (2)
-
-
- changed status to resolved
- Log in to comment
The
pca$z
is a matrix with each row representing individual structures (or frames if MD data are analyzed) and each column the PC (e.g., the first column, pc$z[, 1], represents the coordinates of all structures along the PC1 axis).So, code something like
PCA_WT_H6N6$z[inds, inds]
is conceptually wrong.An example to identify the representative structure of a group based on PCA is