cna_for_large_complex

Issue #539 resolved
karolk created an issue

Hi,

I am doing cna for a large multimeric complex. The problem is that it takes forever, already more than 4 days.

The ncore seems to not work. Only a single core is used.

Is it a normal behavior of the cna()? I am trying to figure out whether my installation is OK.

thanks

Comments (23)

  1. Xinqiu Yao

    cna() does not work with multicores. It is slow for large systems, but 4 days sounds strange. What parameters do you use, such as cij.cutoff, whether using contact map, etc? Have you checked edges with pymol()? If too many edges, the calculation will be super slow because of community detection.

  2. karolk reporter

    Hey Xinqiu,

    Here is my code:

    library(bio3d)
    load("image.RData")
    cij_for_cna <- dccm(modes)
    net <- cna(cij_for_cna,cutoff.cij_for_cna = 0.35)
    

    Right now I am running another calculations with previously filtered correlation matrix. This should speed up the calculations, but I am not sure about reliability of my results, because I am going to filter out with cutoff.cij=0.5, and then network analysis (cna()) with cutoff.cij=0.35. So, I am afraid of removing of to much detail. My matrix is 4684x4684.

    Regarding edges in pymol. Generally pymol does not work for me. The protein is to big. I plan to use VMD, but first I have to reach the stage of accomplished cna.

    Do you have any ideas?

  3. Xinqiu Yao

    If the matrix is already filtered with 0.5 cutoff, you do not need any further cutoff in cna(). And current calculation should be faster. Let me know.

    You do not need to call cna() before checking edges. But if pymol doesn’t work on your machine, it is a problem. I am not sure there is a vmd version for this purpose but will have a look shortly.

  4. Xinqiu Yao

    I found there is no vmd related function for checking correlations yet. I have a in-house code to show cylinders in vmd that might do the job for you (see the attached). You need an input to define residue pairs between which you want to have an edge, and also the value of the edge to define the radius (e.g. correlation strength) and color (positive or negative correlation) of each edge. For example,

    # Support you have a correlation matrix called `cij`, 
    # and a pdb object matching the cij.
    source('vmd.cylinder.R')
    resno <- pdb$atom$resno[pdb$calpha]
    inds <- which(lower.tri(cij), arr.ind=TRUE)
    edges <- cbind(resno[inds[, 1]], resno[inds[, 2]], cij[lower.tri(cij)])
    vmd.cylinder(edges, pdb, file='test', cutoff =0.5)
    

    Then, open a file called 'test.vmd' in VMD. Blue and red cylinders show positive and negative correlations, respectively, and cylinder radius is proportional to correlation strength.

    You might want to start with a high cutoff first, because otherwise it will take a long time to generate the file and to load in VMD as well.

    One disadvantage of the script is that it cannot adjust cutoff directly from VMD, like what pymol.dccm() does in pymol, which hampers your experiment on different cutoff values a little bit.

    Hope it will help.

  5. karolk reporter

    Hi,

    I managed to make pymol running on my machine and the things look much better now. I could play with different cij cutoffs and see how my systems looks like. Nevertheless, thank you very much for VMD script. In a free time I will also try it.

    I would have a question, regarding the filtering of dccm for cna. I checked your paper "Dynamic coupling and allosteric networks in the alpha subunit of heterotrimeric G proteins" and I could see that you first filtered cmap with dcut=10, and then you filtered correlations in your system, using this cmap and cutoff.cij >=0.6. Correct? If yes, one question: How exactly did you do it?

    • cmap(pdb,dcut=10) <-- how about values of scut and pcut ?

    • filtered <- filter.dccm(cij, cutoff.cij=0.6, cmap=cm)

    Correct?

    Thank you

  6. Xinqiu Yao

    The method we used for the paper was actually based on a simulation "ensemble", i.e. for each network we had 5 simulation replicates and network edges were determined by checking the robustness of strong correlations across replicates, as described in the 'Methods' part of the paper. Specifically,

    cm <- cmap(xyz, dcut=10, scut=1, pcut=0.75)
    cij <- filter.dccm(cijs, cutoff.cij=0.6, cmap=cm)
    

    where xyz is the coordiates of C-alpha atoms from simulation trajectories and cijs is a 3D array containing 5 correlation matrices calculated from the 5 simulation replicates (The 3rd dimension indicates the replicate).

    Hope it helps.

  7. karolk reporter

    Very helpful. Thank you.

    So, the cna was simply:

    net <- cna(cijs, cutoff.cij = 0.0)
    

    Correct?

  8. Xinqiu Yao

    Yes, but remember input should be a matrix rather than an array, i.e. “cij” instead of “cijs”

  9. karolk reporter

    Hi,

    This is a continuation, so I don't want to open a new ticket.

    I filtered my cij and then I run cna(), using this cij and cmap.

    My network:

    COMMUNITY NODES#: 269 EDGES#: 50

    max(net$communities$modularity) = 0.8812209

    tree$num.of.comms[ max.mod.ind ] = [1] 269

    all( net$communities$membership == tree$tree[max.mod.ind,] ) = [1] TRUE

    As you could see my network is characterised by extremely high modularity. Furthermore, not all communities are connected, but this is acceptable. I know my system, so I know what I should expect. I looked at my tree (community.tree()) and I could see a clear pick in modularity, but not sure about the value. I think its between 8-9. I don't know how to export my plot(tree) to text file. I tried with write.table().

    I looked at the system in vmd.cna(). When viewing in VMD I coarse grained my network by

    radius = table(net$communities$membership)/35
    

    and it looks good. Now, I want to get the same network, using plot.cna(). I mean, number, location and size of the communities, will be the same as in VMD. Is it possible?

    Before plot.cna() I tried:

    #!
    
    memb.k.X <- tree$tree[ tree$num.of.comms == X, ]
    new.net <- network.amendment(net, memb.k.X)
    

    , where X were 8 or 9. Nevertheless, I am getting: Error in network.amendment(net, memb.k.X) : Input membership and x$community$membership must be of the same length. I get it, but how to coarse grain "net" to get 8 or 9 communities? I think that is the problem?

    Also, I would have a general question. Could you advise me, which algorithm would be best or worth to try in order to get a network with smaller number of communities? As I said, the cutoff for cij is OK, so I don't think that you can get more using default algorithm. I tried combinations of different dcut and scut and the changes are marginal. The only thing, which matters here is cutoff for cij. I have also tried prune(), but did not work well for me. I went through all threads about cna on this forum, but I could not find an answer to my problems.

    thanks

  10. Xinqiu Yao

    You should get exactly the same number and sizes of communities from vmd.cna() and plot.cna(). However, the location might be different because while you can freely rotate the molecule in VMD, the layout in plot.cna() is fixed by default. Try plot.cna(..., interactive=TRUE) and then you can manually adjust the location in the 2D plot to match that is shown in VMD.

    It is weird that you got an error when changing the number of communities using network.amendment(). Have you modified net? Could you print the contents of memb.k.x and net$community$membership and paste them here for checking?

    There are other ways to detect communities in cna() and definitely you should try to see if results get better. See help(cna) for more detail. But to simply reduce the number of communities, I would recommend use network.amendment() after resolving the problem.

    Let me know whenever you get any updates.

  11. karolk reporter

    Hi,

    Thank you for your reply.

    When I invoke "interactive=TRUE" I get this error message. It works normally without this flag: Error in if (is.character(col) && any(substr(col, 1, 1) == "#" & nchar(col) == : missing value where TRUE/FALSE needed

    I tried on two different computers and operating system.

    Currently working on your other comments.

    thanks

  12. Xinqiu Yao

    Can you provide more info about your environment? (For example, copy&paste the output of sessionInfo()). You need latest bio3d and igraph, and also need an R supporting tcltk. A quick way to check is printing sessionInfo() after library(bio3d) and library(igraph).

  13. karolk reporter

    Sessioninfo():

    **R version 3.4.1 (2017-06-30) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: OS X El Capitan 10.11.3

    Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

    locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

    attached base packages: [1] stats graphics grDevices utils datasets methods base

    other attached packages: [1] igraph_1.1.2 bio3d_2.3-3.9000

    loaded via a namespace (and not attached): [1] compiler_3.4.1 magrittr_1.5 parallel_3.4.1 tools_3.4.1
    [5] Rcpp_0.12.13 grid_3.4.1 pkgconfig_2.0.1 tcltk_3.4.1

    **

    When I try:

    plot.cna(net, layout=cent.full, full=TRUE, vertex.label=NA, vertex.size=5, interactive = TRUE)
    

    Error in if (is.character(col) && any(substr(col, 1, 1) == "#" & nchar(col) == : missing value where TRUE/FALSE needed

    thank you

  14. Xinqiu Yao

    Your R environment seems fine. Not sure if it is OS problems as I don't have a Mac to test. Can you find a Linux or Windows machine and try it again? I have tested both and they worked well.

    Alternatively, you can send me an example and see if I can reproduce your errors. xinqiu.yao@gmail.com. Let me know.

  15. karolk reporter

    Hi,

    Regarding the plot.cna(), restating the computer did the job. The flag interactive works. Also, the colouring, the number of communities and their size, are the same as in VMD. Nice:)

    General question. Once you mentioned (one of the previous posts) that "To my own experience, a good proportion of edges (i.e. number of edges divided by total number of possible node pairs) is around 4%." By number of edges you meant: all edges of the network divided by number of all network nodes?

    When using the option "interactive" I can't rotate my network using mouse. I mean, I can manually adjust location of communities, but not necessary select whole network and rotate as a one object. The only way of moving whole network is to use GUI options, e.g., move by 1, 5, 90 degrees. Is it normal behaviour? I want to make sure that "interactive" works correctly.

    thank you

  16. Xinqiu Yao

    Glad you've solved the plot.cna() problem.

    Regarding 4%, I meant number of edges divided by number of node pairs not just nodes. For example, for a 5-node system, the number of all possible node pairs should be 10. Note that this percentage is only a rough estimation and I cannot guarantee its applicability to many different systems. Be sure check network edges using e.g. pymol.dccm before running into deeper network analysis.

    Yes, it is normal. So far, I haven't seen an option to rotate the entire network by an arbitrary degree using mouse.

  17. karolk reporter

    OK. I get it.

    You can close the ticket. In case of further problems, I will reopen again.

    THANK YOU

  18. Log in to comment