PCA cluster average

Issue #818 resolved
Nicia Ferreira created an issue

Hello all,

I have PCA and respective clusters. I want to extract a single representative PDB for each cluster.

I tried to replicate the solution given for RMSD (https://bitbucket.org/Grantlab/bio3d/issues/223/how-to-extract-pdb-file). However, I seem to never get the dimensions right regarding inds… Also, I have 5 clusters.

Errors:

Error in PCA_WT_H6N6[inds, inds] : incorrect number of dimensions

or, if PCA_WT_H6N6$z[inds,inds]

Error in PCA_WT_H6N6$z[inds, inds] : subscript out of bounds

My code is as follows:

find representatives

unq <- unique(grps_WT_H6N6)
all.inds <- seq(1, length(grps_WT_H6N6))
rep.inds <- rep(NA, length(unq))

for(i in 1:length(unq)) {
grp <- unq[i]
inds <- which(grps_WT_H6N6==grp)

if(length(inds) > 1) {
tmp <- PCA_WT_H6N6[inds, inds]

m <- rowMeans(tmp)
rep.ind <- which.min(m)
rep.ind <- all.inds[inds][rep.ind]

}
else {
rep.ind <- inds[1]
}

rep.inds[i] <- rep.ind
}

indices of cluser representatives

rep.inds

write.pdb(pdb, xyz=xyz[rep.inds[1], ], file="rep_clust1.pdb")
write.pdb(pdb, xyz=xyz[rep.inds[2], ], file="rep_clust2.pdb")
write.pdb(pdb, xyz=xyz[rep.inds[3], ], file="rep_clust3.pdb")

Comments (2)

  1. Xinqiu Yao

    The pca$z is a matrix with each row representing individual structures (or frames if MD data are analyzed) and each column the PC (e.g., the first column, pc$z[, 1], represents the coordinates of all structures along the PC1 axis).

    So, code something like PCA_WT_H6N6$z[inds, inds] is conceptually wrong.

    An example to identify the representative structure of a group based on PCA is

    #... 
    
    ## Suppose we only consider PC1 and PC2
    tmp <- PCA_WT_H6N6$z[inds, 1:2]
    m <- colMeans(tmp)
    
    # calculate the distance between each structure 
    #  and the mean
    dd <- dist.xyz(tmp, t(m)) # the transpose of m is required to match the dimension of tmp
    rep.ind <- which.min(dd)
    rep.ind <- inds[rep.ind]
    
    #...
    

  2. Log in to comment