Very slow cna(cij) calculation

Issue #448 new
Fabrício Bracht created an issue

Hello again I've been following the tutorial on network analysis and got stuck during analysis of the md simulation. I've been using my own md simulation. The protein has 302 aa and I've tried using a ~5000 frames simulation dcd file (281 MB) and a ~1000 frames (57 MB) with the commands

trj <- fit.xyz(fixed=pdb$xyz,mobile=dcd,fixed.inds=inds$xyz,mobile.inds=inds$xyz)
cij <- dccm(trj)
net <- cna(cij)

I've found a similar discussion here https://bitbucket.org/Grantlab/bio3d/issues/162/q-a-question-about-bio3d-cna, and there Xin-Qiu Yao suggested to use a larger cutoff for the cna command (say 0.5). The guy commented that he had seen speed improvement using the default cut of 0.4, but the fact that he specified cij.cutoff in the command line had affected the speed of the cna calculation. In my case, none of these have worked. Once I type

net <- cna(cij)

The calculation takes forever. And I mean a very long time. I've gone out for lunch and came back to find that the thing was till running. I left it there and by the end of the afternoon the thing was still not finished. Any suggestions on improving performance? Thank you in advance.

Comments (8)

  1. Lars Skjærven

    Hi Fabrício, Sorry late reply.

    300 residues shouldn't be a problem. First, make sure you are running the dccm function on calpha trajectory only. Second, you might want to filter on "contact map" as well. Here is an exmple:

    prmtop <- read.prmtop("complex_noWAT.prmtop")
    ca.inds <- atom.select(prmtop, "calpha")
    
    trj <- read.ncdf("prod_nowat.nc", at.sel=ca.inds,
                     first=1, last=100, stride=2)
    
    cm <- cmap.xyz(trj, 
                   dcut=10, scut=0, pcut=0.75, mask.lower=FALSE,
                   ncore=4, gc.first=TRUE)
    
    cij <- dccm(trj)
    net <- cna(cij, cutoff.cij=0.4, cm=cm)
    

    You should also consider multiple parallel (e.g. 5) simulations for correlation network analysis. i.e. build multiple correlation matrices (cijs), and use function filter.dccm() to generate a consensus cij matrix for input to cna(). That would be something like this:

    # cij for each sim
    cijs <- lapply(trjs, function(x) {
        cij <- dccm(x)
        cat(".")
        return(cij)
    })
    
    # consensus cij matrix from multiple simulations 
    cij <- filter.dccm(cijs, cutoff.cij = 0.4, cmap = cm)
    
    # correlation network analysis - no cutoff here
    net <- cna(cij, cutoff.cij=0)
    

    Hope it helps ! L

  2. Barry Grant

    You should probably dig into (i.e. investigate further) your cij matrix as it sounds like you have lots of strong couplings leading to manny, many edges.

  3. Fabrício Bracht reporter

    Lars, suggestions worked perfectly, but I've noticed on other thing. I converted the dcd trajectory to a netcdf and did the analysis with the ncdf one. Things went faster. Is this expected?

  4. Fabrício Bracht reporter

    Sorry. I take that back. The dccm calculation is a bit faster, but the cna still takes a long time. The only solution is to do a filtering for CA indices like Lars said.

    Barry: In case there are too many edges, is there a workaround? I mean, someway I can filter the analysis other than filtering by CA indices.

    Thanks

  5. Shahryar Alavi

    Hello everybody,

    I have the same problem as mentioned by Fabrício. My MD has 740 amino acids, and 7500 frames. I tried to do CNA for the backbone atoms.

    Now, I want to solve my problem by utilizing the solution suggested by Lars. However, I want to ask if it's possible to add parallel computing ability to cna.dccm() function (i.e. adding the ncore argument to cna.dccm()). I did an unsuccessful attempt to increase cna.dccm() performance using CPU parallelisation (library(multidplyr)).

    It would be appreciated if you could do so.

  6. Xinqiu Yao

    Hi,

    The main time-consuming part in cna() is the community detection, which calls a function from the igraph R package. We will keep tracking the package and see if they provide options for parallel or even GPU computing. Thanks for the suggestion.

  7. Mohd Athar

    I am also stuck into the same problem, it takes forever to complete cna job. one week over and 2000 CA atoms with 100 frames hasnt finished yet.

  8. Log in to comment