Help with correlation network analysis

Issue #328 resolved
Rakesh Ramachandran created an issue

What is the best way to calculate correlation dccm or lmi especially for correlation network analysis ?

Also can you let me know the R commands on how to calculate consensus correlation matrix as in your Kinesin paper.

Moreover when calculating suboptimal paths how to select source and sink as in your kinesin paper.

Is it possible to use lmi for calculating nma correlation ?

Comments (19)

  1. Barry Grant

    These are still open questions that are current areas of research. Many prefer MI based approaches over the more tractional DCCM. In practice both have disadvantages and some advantages. We plan to publish a manuscript on this soon. For now Bio3D gives you the tools to test this out for your system of interest and your particular research question.

  2. Rakesh Ramachandran reporter

    Thanks for the reply. I have few more doubts based on your Kinesin paper.

    1) Since you use heavy atoms to define networks as well as to calculate paths, is the RMSF you report also for the heavy atoms. Was the trajectory aligned to a reference using heavy atoms ?

    Moreover I used the plot.fluct function but was unable to figure out how to disable the shaded area under the plotted lines. I also see you use consensus secondary structure on the plot, how do I get this.

    2) Does the selection of source and sink depend on the residues of interest mostly based on functional or mutational data.

    3) Finally is this the right way of generating consensus correlation matrix. Does this depend on the number of frames you use for calculation ?

    chunks <- lapply(files, read.dcd) cijs <- lapply(chunks, lmi, ncore=12) traj <- read.dcd("closed_cat_fit.dcd") # Concatenated trajectory cm <- cmap.xyz(traj, dcut=8, scut=1, pcut=0.75, ncore=12)

    calculate consensus cij

    cij <- filter.dccm(cijs, cutoff.cij=0.6, cmap=cm)

  3. Xinqiu Yao

    Hi,

    I can answer part of your questions.

    1) I think the fitting was based on "core" C-alpha atoms and RMSF calculation was for all C-alpha atoms (Is it true, Barry?). Use plot.fluct(type='l', ...) to remove the "shaded area".

    3) Do your trajectories contain only C-alpha atoms or all heavy atoms? If it is the latter, you still need a pdb to "group" the atomic correlations and cmap into "residue wise" matrices. For example,

    grps <- pdb$atom[, 'resno']
    cijs <- lapply(chunks, lmi,  grpby=grps, ncore=12)
    

    Similar processing for cmap calculation. Note that the pdb must contain exactly the same atoms as contained in the trajectories.

    Otherwise, all your codes seem okay to me. I don't think the number of frames would change the results so much as long as you have "sufficient" (e.g. 2000) frames in a statistical point of view. You can try by yourself skipping some frames, e.g. traj[seq(1, nrow(traj), by=2), ] and see how the results change.

  4. Rakesh Ramachandran reporter

    Hi,

    1) For LMI calculation using heavy atoms was the trajectory aligned to reference using heavy atoms or only calpha atoms ?

    2) For the cnapath analysis the index of residue numbers does not match the actual source and sink residue numbers. I have to either subtract or add a certain value to get to the actual residue number of source or sink especially when visualizing the path in VMD. Is there something to be added in the cnapath function ? How to also give multiple source and sink nodes in the cnapath function ?

    3) How can I add labels for the secondary structure segments in RMSF plot as in your Kinesin paper ? Is there a way to shade regions in the RMSF plot based on zscore cutoff instead of significance ?

    Rakesh

  5. Barry Grant

    Q. 1) I believe it was Calpha atoms - you should check if there is a difference for your system from this choice. Q. 2) It is an "index", and as indexes usually do it goes from 1 to N - you should be able to work out your residue numbers easily or use standard R commands to assign residue numbers as names etc.... Q. 3) Use the axis(side=3, at = ssecenters, labels = sselabes) where you calculate your element centers (ssecenters) and assign whatever labels (sselabes) you want. Please do consider reading standard R documentation on such things.

  6. Rakesh Ramachandran reporter

    Thanks for the reply. I am unable to add any customized axis or suppress the default axis labels when I plot. I tried using these parameters xaxt="n", ann=FALSE, axes=FALSE still the values of axis function are not being taken while plotting.

    Can the ssecenters values be obtained automatically from the plot function or I have to manually by trial and error get them from the plotted figure.

    Moreover how can I give multiple source and sink nodes in the cnapath function ?

  7. Xinqiu Yao

    Hi,

    For your above question 2, you can map index to residue number simply by print(pa, pdb=pdb), where pdb is the PDB object matching your path analysis.

    To give multiple source/sink, use vector/matrix for the arguments from and to in the 'cnapath()' function. For example,

    # Calculate paths for all the pairs between nodes 1-3 and nodes 100-101
    pa <- cnapath(net, from=c(1,2,3), to=c(100, 101))
    
    # Calculate paths for specified node pairs: 1/100 and 2/101
    f <- matrix(c(1,2,100,101), nrow=2)
    f
    
         [,1] [,2]
    [1,]    1  100
    [2,]    2  101
    
    pa <- cnapath(net, from=f)
    

    All these are documented in the help page of corresponding function and so please read and read again before posting questions.

  8. Barry Grant

    Indeed, please do carefully read the documentation - of course please let us know if it is not clear and feel free to suggest documentation improvements.

  9. Rakesh Ramachandran reporter

    Thanks for the reply. It was not clear to me from the documentation. For "from" argument it says Integer, node id for the source and for "to" argument it says Integer, node id for the sink. It would be better if you make it as a list or vector where one can supply multiple values. Also if possible please provide an example for the same under "Examples" in the documentation.

    Could you please answer my other question on customized axis ?

  10. Xinqiu Yao

    What function did you use for the plot? I tried plot.fluct(x, ..., axes=FALSE) and it turned off all axes.

  11. Rakesh Ramachandran reporter

    Thanks for the reply. I tried using the following code, all the axes are turned off but the customized axis provided by axis function is not plotted.

    plot.fluct(x=rr, col=c(rep('red',1), rep('blue', 1)), sse = dssp(pdb), resno = pdb, type="l", lwd=0.1, xaxt="n", ann=FALSE, axes=FALSE)
    axis(side = 1, at = c(1769, 1800, 1830, 1860, 1890, 1920, 1950, 1980, 1990), labels = c(1769, 1800, 1830, 1860, 1890, 1920, 1950, 1980, 1990))
    
  12. Xinqiu Yao

    Are those numbers in axis() residue numbers (i.e. from PDB directly) or residue indices (the sequential number of residues starting from 1, 2, ...)? Note that the internal coordinating system of the plot assumes the latter. Therefore, if you are using "residue numbers", they are possibly out of the range and won't show on the plot.

  13. Rakesh Ramachandran reporter

    Thanks I found the error it was the mismatch between residue numbers and internal coordinating system.

    Is there a way I can get the ssecenters values or locations for adding secondary structure labels in the plot ?

  14. Xinqiu Yao

    Sure. First calculate sse with sse <- dssp(sse, resno=FALSE) (resno=FALSE will give residue index rather than residue numbers). Then, for each SSE element e.g. helix 1, you can get the middle location by sse.x <- (sse$helix$start[1] + sse$helix$end[1])*0.5. Similarly calculate the middles for other elements.

  15. Log in to comment