Give up trying to find out how to extract average structure,,

Issue #326 resolved
Former user created an issue

Hi guys, first of all, this project is awesome! Despite of reading and re-reading Issue #223, i still can't figure out how to identify and extract a frame, in pdb format, which would be representative of certain cluster. I really appreciate all possible help,, Best regards, Abe

Comments (5)

  1. Barry Grant

    Don't give up Abe ;-) It is a step learning curve but well worth it in our opinion. Here is an example of a slightly long winded way to do what you are after. There are more efficient, shorter ways but they are probably not as easy to understand. Hope this helps...

    library(bio3d)
    
    ## Load some example Calpha only trajectory data
    trtfile <- system.file("examples/hivp.dcd", package="bio3d")
    trj <- read.dcd(trtfile)
    
    ## Read the starting PDB file to determine atom correspondence
    pdbfile <- system.file("examples/hivp.pdb", package="bio3d")
    pdb <- read.pdb(pdbfile)
    
    ## Superpose (i.e. fit frames to PDB) 
    xyz <- fit.xyz(pdb$xyz, trj)
    
    ## Perform PCA
    pc <- pca(xyz)
    
    ## Cluster analysis in PC1-2 space
    hc <- hclust(dist(pc$z[,1:2]))
    
    ## Divide frames (i.e. rows of 'xyz') into two cluster groups
    grps <- cutree(hc, k=2)
    
    ## Calculate the mean of the first group and write to a PDB file
    gp1.ave <- colMeans(xyz[grps==1,])
    write.pdb(pdb=pdb, xyz=gp1.ave, file="AverageStructure_group1.pdb")
    
    
    ## Find the closest frame (here we take the smallest RMSD) to the group1 average structure 
    gp1.min.ind <- which.min(rmsd(gp1.ave, xyz[grps==1,]))
    gp1.min.xyz <- xyz[grps==1,][gp1.min.ind,]
    
    write.pdb(pdb=pdb, xyz=gp1.min.xyz, file="FrameRepresentative_group1.pdb")
    

    As a sanity check you could view these Calpha only structures as 'tube' in VMD for example. Also you can check where it is in your PC space, for example:

    ## Set up a back (gp1) and red (gp2) color vector with our mean gp1 point in green
    my.colours = grps
    my.colours[grps==1][gp1.min.ind]="yellow"
    
    plot(pc, pc.axes=1:2, col=my.colours)
    

    Note that above we picked the frame closest to the mean group 1 coordinates in terms of RMSD but, depending on your application, you may want to do this in principal component space (which would be a less expensive calculation also).

  2. Abraham Vidal

    Dear Professor Grant, thank yo very much. I was able, with your help, to extract the frame I concern. As you said, these is a curve of learning, and after 2 weeks, i'm much clear about the philosophy and tools you just implemented to R. Greetings from Mexico,, Abe

  3. Log in to comment