Q&A: Question about bio3d CNA

Q: I am using CNA to build the network for my protein and found this new function very powerful and helpful. The problem for me is that for a 200-aa protein, the calculation seem taking long time (hours, maybe) to finish for a 1000-frame trajectory. I did the same using CARMA/networkSetup, and it took no time to finish for the same task. Would you please let me know 1) Is the speed I got normal, or there is way to speed up? 2) can the results from networkSetup be plotted using bio3d, i, e, are the result files compatible? thanks a lot!

================================

A: For a 200-aa protein with 1000-frame trajectory, it shouldn't take so long time. Could you tell me what exactly you were calculating (the bio3d function you used)? The bio3d version and OS type you worked on are also helpful.

For some functions, e.g. cmap() and dccm(), we have an option 'ncore' to turn on/off multicore calculations. If your machine has multiple CPU cores, set 'ncore' to a value larger than 1 would gain a significant speed up. Check the help page of the function to confirm it supports the opton, e.g. help(cmap). Also note that the the multicore feature is not supported in Windows.

Currently, bio3d doesn't support CARMA/networkSetup outputs. If the output is simply a matrix, you can use 'read.table()' function to read it easily. Sorry I am not familiar with CARMA or networkSetup so can't help so much.

Just let me know the information I need above, and then I may try to reproduce your problem and hopefully figure out what's wrong.

================================

Q: Thank you for your quick reply. Sorry I was referring to the net <- cna(cij) calculation on a Ubuntu OS using bio3d 2.1. I followed the procedure in your tutorial. All CA atoms were designated as nodes. For the dccm calculation, it is pretty fast. I tried the ncore option to see the speedup, net <- cna(cij, ncore=8) however it seems only one core is used and the speed is the same. The job has been running for over 3 hours. thanks,

================================

A: 'cna()' doesn't support ncore, but it normally takes only seconds. One possible reason here is that the default 'cij cutoff', 0.4, might be too low for your case. After calculated cij with 'dccm()', check it with the funciton 'view.dccm()' (You need pymol installed properly):

view.dccm(cij, pdb=your_pdb, step=0.1, launch=TRUE)

Pymol window will pop up with your protein shown at the center and all node correlations represented with straight lines. From the right panel uncheck all correlations under the cutoff 0.4. Are there still many lines connecting distal nodes? If it is true, you may need to increase the cutoff value.

The reason is that 'cna()' automatically calculates 'betweenness communities'; Too many network edges will dramatically slow down the calculation. Also, too many long-range edges could destroy 'modularity' of the network and should be avoid. To my own experience, a good proportion of edges (i.e. number of edges divided by total number of possible node pairs) is around 4%.

If you don't mind, send me your cij matrix (an RData file would be fine), and then I can also check it for you.

================================

Q: Indeed it runs fast if cij.cutoff is specified, even when it is set to 0.4.

I have more questions about the cna calculation when trying to use a contact map filter.

Here what atoms are used as the nodes and edges, only the CA atoms? If yes, are any non-CA atoms used for dccm and cna calculation? How to specifically define certain atoms as nodes?

inds <- atom.select(pdb, resno = c(24:27, 85:90), elety = "CA")
trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd,
               fixed.inds = inds$xyz, mobile.inds = inds$xyz)

cij <- dccm(trj)
net <- cna(cij)

Here since all heavy atoms are used, does it means in the above calculations, it is necessary to use a trajectory containing all heavy atoms, but not hydrogen atoms?

cm <- cmap(trj, dcut = 4.5, scut = 0, pcut = 0.75, mask.lower = FALSE)
net.cut <- cna(cij, cm = cm)

BTW, in the manual, it says ncore option is available for cna calculation.

Thank you very much for your time and help. Have a great holiday!

================================

A: Happy holiday, too!

The pdb and trajectory files ("hivp.pdb" and "hivp.dcd") used in the tutorial contain ONLY 'CA' atoms! That's why in the examples we simply call 'dccm(trj)' and 'cmap(trj, ...)', without specifying which atoms to use (They are always CA atoms).

For your own system, you must be careful to use the same codes from the tutorial because your trajectory and pdb files will likely contain all atoms including hydrogen. In this case, there are two ways to use cmap filter:

The first is simply calculating atomic distance for CA atoms. Note that in this way, it is better use a higher distance cutoff 'dcut'.

inds <- atom.select(pdb, elety="CA")
cm <- cmap(trj[, inds$xyz], dcut=10, scut=0, pcut=0.75, mask.lower=FALSE)

Second, use all heavy atoms for distance calculation but still return a contact map for CA atoms. The trick is to use the 'grpby' option provided with 'cmap()' function. In this way, contacting nodes are defined as those having any heavy atom within 'dcut' distance.

noh.inds <- atom.select(pdb, "noh")
cm <- cmap(trj[, noh.inds$xyz], grpby=pdb$atom[noh.inds$atom,
"resno"], dcut=4.5, scut=0, pcut=0.75, mask.lower=FALSE)

Note that 'grpby' is also supported in 'dccm()', and so can include side-chain correlations without changing nodes of the network:

cij <- dccm(trj[, noh.inds$xyz], grpby=pdb$atom[noh.inds$atom, "resno"])

'cna()' does support 'ncore' but it only functions when you have multiple input 'cij' matrices, i.e. when you want to build multiple networks at the same time.

Let me know if you have more questions!

Comments (3)