queries regarding cij and cna calculations

Issue #339 resolved
Kajwal Kumar Patra created an issue

Dear sir, I have some queries regarding tha cij and cna calculations,

  1. i want my cij plots(residue cross correlation) according to the residue numbers of my pdb file so that i can interpret things clearly. here the problem is that though the plot shows the Residue No. in x and Y axis of the plot but actually it presents the selected atoms. like for example say my pdb file contains 4 chains and in each chain HSE LYS ARG resids with resno 110, 111 , 112 respectively which includes 30 atoms for each chain and total 120 atoms. . now when i type command as: inds <- atom.select(pdb, resno=c(110:113) elety =CA), it selects the CA atoms of all 4 chains and in the plot also i am getting this atom numbers. but i want the plot with proper resno like 110 111 112 .

        *the same problem with CNA calculations(want resno insteaad of atom ids)
    
  2. after geting the cna plot I.e. net <- cna(cij), xy <- plot.cna()net i got the object xy and i saved the data ox xy by typing: write.table(xy, file="xy.txt") now the 1st raw of the table is as follows: "V1" "V2" 1 4.123 2.274 so what do the last 2 columns refer ?

  3. "summary(net)" command prints the information about community ids, size and members of communities and all.

(i) here also the members of the communities are not in residue numbers but i want them in resno instead of atom ids

(ii)how can i write the data of summery(net), specially the community IDs along with the corresponding members to a file.

looking forward your response,

kajwal

Comments (16)

  1. Xinqiu Yao

    Hi,

    I couldn't understand your question #1. Could you show some example plot and re-tell us your problems based on the plot?

    Also, present an example for your question #2 (I tried but couldn't reproduce your problem). Note that the first row/first column are just row/column names. You can skip them by write.table(xy, file='xy.txt', row.names=FALSE, col.names=FALSE)

    You can map network node ids to residue numbers easily. For example

    #suppose your pdb object that matches the network is 'pdb'
    resno <- pdb$atom[pdb$calpha, 'resno']
    cm1 <- summary(net)$members[[1]] # node ids for community #1
    cm1res <- resno[cm1] # resnos for community #1
    

    To save an object from R, use the save() function. For example, save(net, file='net.RData') will write a binary file containing the 'net' object. Next time when you want to use the object, just type load('net.RData').

  2. Kajwal Kumar Patra reporter

    Dear sir,
    thanks for the response,
    for question 1. i have attached my pdb file and rplot for cij. As the pdb file contains 4 chains with resid# starting from 113 to 599 for each chain, i want the axis label of my plot to start from 113, 114...599, 113, 114 ...599, 113, 114,..599, 113, 114...599 for chains A B C D respectively.

    1. for cna calculations, how can i extract data for edge width which joins two communities or can i write the data for edge width to a txt files so that i can use them in other plotting interfaces .
  3. Xinqiu Yao

    Currently, the plot.dccm() dose not support changing x- and y-axis tick labels. We have an experimental new "plot.dccm()" under the new_funs/ directory of the repository (See here), which supports changing axes labels and probably can solve your problem. What you need is to source two scripts under new_funs, i.e. plot.dccm.R and add.sse.R to replace the function in the package. For example,

    source('plot.dccm.R')
    source('add.sse.R')
    plot.dccm(cij, sse=pdb, resno=pdb)
    

    To extract the information of community edges, use the matrix of residue-wise network edges (net$cij) combined with the definition of community membership (summary(net)$members). For example, to get the minimum edge width between community #1 and #2, use

    m1 <- summary(net)$members[[1]]
    m2 <- summary(net)$members[[2]]
    min(net$cij[m1, m2])
    

    Note that the net$cij actually represents the -log() form of the original cross-correlation values. If you are looking at correlations, use the cij that was used to build the net instead of net$cij.

    You can write edge width to hard drive by e.g. write.table(net$cij, file='cij.txt', row.names=FALSE, col.names=FALSE).

    Let me know if you still have problems.

  4. Kajwal Kumar Patra reporter

    For Question 1. As per your advice i coppied plot.dccm.R, and add.sse.R from new_functions/ repository to my working directory and then followed the commands: source('plot.dccm.R') source('add.sse.R') plot.dccm(cij, sse=pdb, resno=pdb)

    but still i am getting the desired plot as i explained earlier, instead i am getting a warning message: Warning message: In plot.dccm(cij, sse = pdb, resno = pdb) : Length of input 'resno' does not equal the length of input 'x'; Ignoring 'resno'

    i am attaching the new plot as Rplot3.png.

    sorry for bothering you,

    kajwal

  5. Xinqiu Yao

    Hi,

    Can you tell me the commands you used to generate the plot? Also, make sure the pdb you used matches the cij matrix. For example, the following two commands should give the same results:

    nrow(cij)

    sum(pdb$calpha)

    If possible, also attach your cij file here or send it to xinqyao@umich.edu. Then I can try to reproduce your problem more easily..

  6. Kajwal Kumar Patra reporter

    Hi sir, i have attached the cij file and the commands are as follows:

    pdb <- read.pdb("nw_T592D_neutral.pdb")
    dcd <- read.dcd("NW_1000_eq8.dcd")
    inds <- atom.select(pdb, elety = "CA")
    trj <- fit.xyz(fixed= pdb$xyz, mobile = dcd, fixed.inds = inds$xyz, mobile.inds = inds$xyz )
    cij <- dccm(trj[, inds$xyz])
    source('plot.dccm.R')
    > source('add.sse.R')
    > plot.dccm(cij, sse=pdb, resno=pdb)
    

    *note that the pdb file has 4 chains and each chain has residno starting from 113 to 599 . and my aim is to plot the graph which contains all four chains with corresponding residno.

  7. Xinqiu Yao

    Hi,

    I tried with your provided pdb and cij files but couldn't reproduce the same problem you encountered. The commands I used are

    library(bio3d)
    source('plot.dccm.R')
    source('add.sse.R')
    
    pdb <- read.pdb("nw_T592D_neutral.pdb")
    cij <- read.table('cij.txt')
    cij <- data.matrix(cij) # make cij a numeric matrix
    plot.dccm(cij, resno=pdb)
    

    and the effect is as the attached figure. plot.png

    Note that 'sse=pdb' is removed because your pdb does not contain any SSE information. Use

    sse <- dssp(pdb)
    plot.dccm(cij, sse=sse, resno=pdb)
    

    if you still want SSE annotations. (You may also try 'contour=FALSE' in plot.dccm() to remove contour lines to make the plot clearer).

    I am not sure what exactly happened in your case. It might be due to the version of the lattice package installed in your R, or even the version of R you installed itself. (Btw, I tested it with R 3.2.1 and lattice 0.20-33). If doable, update both and try again. If you continue to encounter the problem, my suggestion is: generate the plot using regular 'plot.dccm()' (i.e. the one in the package not under the 'new_funs') and manually modify the axis labels with e.g. Illustrator.

    Hope it is helpful and good luck!

  8. Kajwal Kumar Patra reporter

    Thank you sir, for your continuous effort. i am really thankful to you. it's good to see that the axis labels are in resno. But i have some points regarding the plot to mention:

    1. the axis label starts from 113(as expected) and unfortunately ends at around 164 . But we have selected all the CA atoms(ie. resno 113 to 599 for all 4 chains(A,B,C,D)). so the axis label shown in the plot represents which chain?
    2. why it stops at 164 ? it should have reached to 599
    3. If it shows only for a single chain then what about the other chains?

    I am expecting the axis label like: 113,114....599, 113,114....599, 113, 114...599, 113,114....599

    regards, kajwal

  9. Xinqiu Yao

    Hi,

    The numbers shown are one for each chain. Unfortunately where to show the 'ticks' is determined automatically and is hard to change without modifying the code.

    It won't show labels like 113, 114...599; otherwise it will be messed up with too many overlapped labels along each axis.

    Btw, the axis labels here are actually not very informative because of the large size of the matrix. Why not just label chain IDs and see how intra- and inter-chain correlations are distributed?

  10. Log in to comment