Very slow cna(cij) calculation
Hello again I've been following the tutorial on network analysis and got stuck during analysis of the md simulation. I've been using my own md simulation. The protein has 302 aa and I've tried using a ~5000 frames simulation dcd file (281 MB) and a ~1000 frames (57 MB) with the commands
trj <- fit.xyz(fixed=pdb$xyz,mobile=dcd,fixed.inds=inds$xyz,mobile.inds=inds$xyz)
cij <- dccm(trj)
net <- cna(cij)
I've found a similar discussion here https://bitbucket.org/Grantlab/bio3d/issues/162/q-a-question-about-bio3d-cna, and there Xin-Qiu Yao suggested to use a larger cutoff for the cna command (say 0.5). The guy commented that he had seen speed improvement using the default cut of 0.4, but the fact that he specified cij.cutoff in the command line had affected the speed of the cna calculation. In my case, none of these have worked. Once I type
net <- cna(cij)
The calculation takes forever. And I mean a very long time. I've gone out for lunch and came back to find that the thing was till running. I left it there and by the end of the afternoon the thing was still not finished. Any suggestions on improving performance? Thank you in advance.
Comments (8)
-
-
You should probably dig into (i.e. investigate further) your cij matrix as it sounds like you have lots of strong couplings leading to manny, many edges.
-
reporter Lars, suggestions worked perfectly, but I've noticed on other thing. I converted the dcd trajectory to a netcdf and did the analysis with the ncdf one. Things went faster. Is this expected?
-
reporter Sorry. I take that back. The dccm calculation is a bit faster, but the cna still takes a long time. The only solution is to do a filtering for CA indices like Lars said.
Barry: In case there are too many edges, is there a workaround? I mean, someway I can filter the analysis other than filtering by CA indices.
Thanks
-
Hello everybody,
I have the same problem as mentioned by Fabrício. My MD has 740 amino acids, and 7500 frames. I tried to do CNA for the backbone atoms.
Now, I want to solve my problem by utilizing the solution suggested by Lars. However, I want to ask if it's possible to add parallel computing ability to
cna.dccm()
function (i.e. adding the ncore argument tocna.dccm()
). I did an unsuccessful attempt to increasecna.dccm()
performance using CPU parallelisation (library(multidplyr)
).It would be appreciated if you could do so.
-
Hi,
The main time-consuming part in
cna()
is the community detection, which calls a function from the igraph R package. We will keep tracking the package and see if they provide options for parallel or even GPU computing. Thanks for the suggestion. -
- changed component to ToDo
- changed version to v2.4/3.0 [future]
- marked as enhancement
Keep tracking
igraph
to see if parallel or GPU computing is available for community detection (To acceleratecna()
). -
I am also stuck into the same problem, it takes forever to complete cna job. one week over and 2000 CA atoms with 100 frames hasnt finished yet.
- Log in to comment
Hi Fabrício, Sorry late reply.
300 residues shouldn't be a problem. First, make sure you are running the dccm function on calpha trajectory only. Second, you might want to filter on "contact map" as well. Here is an exmple:
You should also consider multiple parallel (e.g. 5) simulations for correlation network analysis. i.e. build multiple correlation matrices (cijs), and use function
filter.dccm()
to generate a consensus cij matrix for input tocna()
. That would be something like this:Hope it helps ! L