Very slow cna(cij) calculation

Issue #448 new

Fabrício Bracht created an issue 2017-03-17

Hello again I've been following the tutorial on network analysis and got stuck during analysis of the md simulation. I've been using my own md simulation. The protein has 302 aa and I've tried using a ~5000 frames simulation dcd file (281 MB) and a ~1000 frames (57 MB) with the commands

trj <- fit.xyz(fixed=pdb$xyz,mobile=dcd,fixed.inds=inds$xyz,mobile.inds=inds$xyz)
cij <- dccm(trj)
net <- cna(cij)

I've found a similar discussion here https://bitbucket.org/Grantlab/bio3d/issues/162/q-a-question-about-bio3d-cna, and there Xin-Qiu Yao suggested to use a larger cutoff for the cna command (say 0.5). The guy commented that he had seen speed improvement using the default cut of 0.4, but the fact that he specified cij.cutoff in the command line had affected the speed of the cna calculation. In my case, none of these have worked. Once I type

net <- cna(cij)

The calculation takes forever. And I mean a very long time. I've gone out for lunch and came back to find that the thing was till running. I left it there and by the end of the afternoon the thing was still not finished. Any suggestions on improving performance? Thank you in advance.

Comments (8)

Lars Skjærven

Hi Fabrício, Sorry late reply.

300 residues shouldn't be a problem. First, make sure you are running the dccm function on calpha trajectory only. Second, you might want to filter on "contact map" as well. Here is an exmple:

prmtop <- read.prmtop("complex_noWAT.prmtop")
ca.inds <- atom.select(prmtop, "calpha")

trj <- read.ncdf("prod_nowat.nc", at.sel=ca.inds,
                 first=1, last=100, stride=2)

cm <- cmap.xyz(trj, 
               dcut=10, scut=0, pcut=0.75, mask.lower=FALSE,
               ncore=4, gc.first=TRUE)

cij <- dccm(trj)
net <- cna(cij, cutoff.cij=0.4, cm=cm)

You should also consider multiple parallel (e.g. 5) simulations for correlation network analysis. i.e. build multiple correlation matrices (cijs), and use function filter.dccm() to generate a consensus cij matrix for input to cna(). That would be something like this:

# cij for each sim
cijs <- lapply(trjs, function(x) {
    cij <- dccm(x)
    cat(".")
    return(cij)
})

# consensus cij matrix from multiple simulations 
cij <- filter.dccm(cijs, cutoff.cij = 0.4, cmap = cm)

# correlation network analysis - no cutoff here
net <- cna(cij, cutoff.cij=0)

Hope it helps ! L

2017-03-29T07:06:23+00:00

Barry Grant
You should probably dig into (i.e. investigate further) your cij matrix as it sounds like you have lots of strong couplings leading to manny, many edges.
- 2017-03-30T17:14:43+00:00
Fabrício Bracht reporter
Lars, suggestions worked perfectly, but I've noticed on other thing. I converted the dcd trajectory to a netcdf and did the analysis with the ncdf one. Things went faster. Is this expected?
- 2017-03-31T17:10:27+00:00
Fabrício Bracht reporter
Sorry. I take that back. The dccm calculation is a bit faster, but the cna still takes a long time. The only solution is to do a filtering for CA indices like Lars said.

Barry: In case there are too many edges, is there a workaround? I mean, someway I can filter the analysis other than filtering by CA indices.

Thanks
- 2017-03-31T17:22:32+00:00
Shahryar Alavi
Hello everybody,

I have the same problem as mentioned by Fabrício. My MD has 740 amino acids, and 7500 frames. I tried to do CNA for the backbone atoms.

Now, I want to solve my problem by utilizing the solution suggested by Lars. However, I want to ask if it's possible to add parallel computing ability to cna.dccm() function (i.e. adding the ncore argument to cna.dccm()). I did an unsuccessful attempt to increase cna.dccm() performance using CPU parallelisation (library(multidplyr)).

It would be appreciated if you could do so.
- 2018-04-22T08:37:03+00:00
Xinqiu Yao
Hi,

The main time-consuming part in cna() is the community detection, which calls a function from the igraph R package. We will keep tracking the package and see if they provide options for parallel or even GPU computing. Thanks for the suggestion.
- 2018-04-22T17:30:29+00:00
Xinqiu Yao
- changed component to ToDo
- changed version to v2.4/3.0 [future]
- marked as enhancement
Keep tracking igraph to see if parallel or GPU computing is available for community detection (To accelerate cna()).
- 2018-04-22T17:32:28+00:00
Mohd Athar
I am also stuck into the same problem, it takes forever to complete cna job. one week over and 2000 CA atoms with 100 frames hasnt finished yet.
- 2024-01-03T10:42:21+00:00
Log in to comment

Assignee: –

Type: enhancement

Priority: major

Status: new

Component: ToDo

Version: v2.4/3.0 [future]

Votes: 1

Watchers: 4