contact map clustering

Issue #209 resolved
Former user created an issue

Is it feasible to do hierarchical clustering of md snapshots based on the calpha atoms contact maps within Bio3d?

Comments (4)

  1. Xinqiu Yao

    Hi,

    There is no specific function for it in bio3d, but you can write a simple script to do it by yourself. Following is an example,

    # calculate contact map for each frame
    cm <- lapply(1:nrow(xyz), function(i) cmap(xyz[i, ca.inds$xyz], dcut=10))
    
    # pick up the upper triangular elements from each cmap and put them together to make a matrix,
    # one row per cmap.
    cm.upper <- t(sapply(cm, function(x) x[upper.tri(x)]))
    
    # do the clustering based on 'binary' distance
    hc <- hclust(dist(cm.upper, method='binary'))
    plot(hc, hang=-1)
    
  2. Barry Grant

    It is certainly doable in a number of ways. I will just point out that calculating the contact map for every frame, as in Xinqiu's example above, may take some time for very large trajectories (where nrow(xyz) is large). For testing initially you might want to skip frames by setting the row/frame index i to seq(1,nrow(xyz), by=10) where the by=10 states that we want every tenth frame number.

  3. Xinqiu Yao

    Thanks for the sensible comments! Yes, it should be taken carefully if nrow(xyz) is large, and skipping frames by calling seq(by=10) is a good idea. Also, what I may recommend here is using mclapply to gain speed up (need multiple CPU cores installed in user's box).

    ncore <- setup.ncore(NULL)
    cm <- mclapply(seq(1,nrow(xyz),by=10), function(i) cmap(xyz[i, ca.inds$xyz], dcut=10), mc.cores=ncore)
    
  4. Log in to comment