Network generation from molecular dynamics data

Issue #240 resolved
Ashfaq Ahmad created an issue

Hi all, I have a .dcd file (95 MB, 1100 frames) of almost 300 residue protein. Before to generate a network, I trim my dcd and pdb file to C-alpha. The following command net <- cna(cij, cutoff.cij=0.35) took more than 6 hours. Although I tried to select resno c(1:100)

On the other hand, if I use a pdb structure of almost 100 residue for a .dcd file of 300 residues then it behaves very well and takes less than a minute.

Is it the right way to superimpose the 300 residue dcd trajectories on a 100 residue pdb structure? The longer time taken by bio3d (more than 6 hours) is normal?

Regards,

Comments (11)

  1. Lars Skjærven

    Hi Ashfaq, I think Xinqiu can provide a better answer here, but you can consider reducing the number of frames, e.g. by the command: traj2 <- traj[seq(1, 1100, 5), ]. How does that work ?

  2. Xinqiu Yao

    Thanks for answering, Lars!

    Hi Ashfaq,

    The 'cna()' function usually takes less than 1 min for systems of similar size to yours. The 6 hours you experienced is very weird and must be something wrong. One possible reason is that the cij matrix still had too many none-zero values after applying the filter cutoff.cij=0.35. This will result in a highly connected network, which substantially slows down the calculation for communities (by default, cna() constructs the network and meanwhile calculates communities with the Girvan-Newman's algorithm).

    I strongly recommend you check your cij with the function view.dccm(), e.g. view.dccm(cij, pdb=pdb, launch=TRUE). (NOTE: you need PyMol installed). By turning on/off different cij values in the pop-upped pymol window, you will find a more suitable cutoff to build the network. For example, a cutoff that remains most local couplings and removes most non-local couplings connecting distal sites.

    Also, you can filter your cij with a contact map as did in the paper by Sethi et al (PNAS 2009, 106:6620-25). Use the 'cmap()' function to calculate the contact map from the trajectory and apply it to 'cna()'. For example

    # Use the all-heavy-atom trajectory and pdb to calculate the cmap
    cm <- cmap(dcd, grpby=pdb$atom[, 'resno'], dcut=4.5, scut=3, pcut=0.75, mask.lower=FALSE)
    
    net <- cna(cij, cutoff.cij=0.35, cm=cm)
    

    I don't think it is a good idea to superimpose 300 residue dcd to an arbitrary 100 residue pdb, as correlation results are largely dependent on how you fit your trajectory. Use all CA atoms or the 'core' (see help(core.find)) positions to do fitting.

    Let me know if you still get problems!

  3. Ashfaq Ahmad reporter

    Thanks Yao, Yes i have too many long distance contacts shown in blue lines (pymol).

    net <- cna(cij, cutoff.cij=0.98) Warning message: In cna.dccm(cij, cutoff.cij = 0.98) : The number of communities is larger than the number of unique 'colors' provided as input. Colors will be recycled net <- cna(cij, cutoff.cij=0.99) Warning messages: 1: In cna.dccm(cij, cutoff.cij = 0.99) : The $communities structure does not allow a second clustering (i.e. the collapsed community.cij matrix contains only 0). 'community.network' object will be set to NA 2: In cna.dccm(cij, cutoff.cij = 0.99) : The number of communities is larger than the number of unique 'colors' provided as input. Colors will be recycled net <- cna(cij, cutoff.cij=0.8)

    Here 0.99 cutoff says communities are too many but 0.8 says fine. Although when I apply view.dccm(cij, pdb=pdb, launch=TRUE) after 0.8 so I see still many blue color (long distance communities are present. Any suggestion? and also please update me what could be the final cutoff value because once i tried 1.2 and that crashed R. So i just assumed that may be i can not select cutoff beyond 0.99. Regards

  4. Xinqiu Yao

    Hi,

    You can't specify cutoff beyond 1 (or close to it like 0.99 I suspect). Remember the cij are correlations (with values in a range [-1, 1]), which means if you use cutoff.cij=1 or beyond, all edges between nodes will be excluded, causing numerical problems as you experienced.

    To my own experience, a cutoff between 0.3 and 0.8 should be acceptable. If you see many strong "anti-correlated" (blue lines in Pymol) long-distant pairs, it usually indicates your system is very flexible, such as multi-domain proteins and protein complexes. If this is true in your case, I recommend try to use residue Calpha-Calpha distance or residue contact map to filter your cij. See my reply above for an example.

    If you are studying a single-domain globular protein, the strong correlations you have seen are peculiar. In this case, I recommend go back to check the fitting procedure and make sure everything has been done properly. For example, you can write down your fitted trajectory with the function 'write.ncdf()' and inspect the movie with VMD.

  5. Ashfaq Ahmad reporter

    Yes you are right my protein contains three domains, and i am trying to study the signal relay for a conformational changes. We have crystal structures from our lab of the same protein in two distinct conformations (active and inactive). I found this correlation stuffs so trying to get something from here that may help me to study the pattern of conformational changes. I will try the above stuffs with much consideration. Thanks mate

  6. Ashfaq Ahmad reporter

    Sorry again, just one suggestion is needed, After several filtration i am receiving two groups of clusters. Sadly both these two groups have no edge in between, no matter I use size = 0, 1 or other value in net.pruned command both these two groups are separate. In the smaller group one cluster is bigger than 20 nodes, on the other side few clusters are smaller than 20. If I am removing the smaller group as a result the cluster smaller than 20 nodes in bigger group are also affecting that destroy group 1 too. Any suggestion how to deal such kind of problem. One way i know is to keep many many small clusters but in that case the system becomes really crowded and kind of meaningless. Regards

  7. Xinqiu Yao

    Hey,

    I am not very clear about the problem you've encountered. If you want to see edges between all clusters, increase the distance cutoff when you calculate the contact map. For example use dcut=6.0 instead of dcut=4.5. Also, use scut=1 to include edges between neighboring residues.

    If you don't mind, send me a copy of your network and the pdb saved in an R object file. For example, obj = list(net=net, pdb=pdb); save(obj, file='obj.RData'). Then send the 'obj.RData' file to xinqyao@umich.edu. With these, I can probably help to check if the partition looks reasonable or not.

  8. Log in to comment