Ionic pairs within protein

Issue #444 resolved
Fabrício Bracht created an issue

Hi, I am new to the Bio3d package, but I would like, first of all, to congratulate you guys on it. I've followed some of the vignettes but, there are some things I still don't know how to do. I would like to generate a contact map, from a single pdb and after that from a trajectory, between ionic residues, meaning, I would like to select all arginine residues and see which ones interact with aspartate and glutamate residues. From the cmap command I was able to generate a contact map with this: arg.inds <- atom.select(pdb, resid='ARG', 'sidechain') asp.inds <- atom.select(pdb, resid='ASP', 'sidechain') cmap <- cmap(pdb$xyz,inds=arg.inds, grpby=asp.inds$resno)

But I am not sure this is exactly what I want. First, I would like the plot to be per residue labelled, and this one seems to show contacts between atoms. Is there a way to do that?

I am running R version 3.2.3 on ubuntu mate 16.04.

Thanks in advance Fabrício

Comments (13)

  1. Xinqiu Yao

    Hi,

    The command you have tried won't give you the answer you want. The inds argument in cmap.pdb() is to define a subset of atoms used to calculate cmap. The grpby is to group atoms for the calculation, e.g. to obtain a residue-wise cmap based on all-atom coordinates (a result you probably want). Please read the help document of the function for more details help(cmap.pdb).

    I would recommend you calculate the full contact map first. Then, you can create any sub-matrix you want from the full cmap based on the residues you would like to focus. For example,

    cm <- cmap(pdb)
    spdb <- trim(pdb, 'calpha')
    arg.inds <- atom.select(spdb, resid='ARG')
    asp.inds <- atom.select(spdb, resid='ASP')
    
    # following gives you the sub-matrix between all 'ARG' and 'ASP' for your further inspection.
    cm[arg.inds$atom, asp.inds$atom]
    
  2. Fabrício Bracht reporter

    Hi. Thank you very much for the reply and the suggestion. Let me just check if I understood it correctly. The contact map from the first command calculates the entire contact map for all atoms in the pdb. After that, I select only the alpha carbons to use as index for the asp and arg residue numbers, so that the filtered contact map has only the contacts for the asp and arg residues. Is that correct? I've read the help from cmap and it states that the cmap.pdb is a wrapper for cmap.xyz which selects non water and calculates the contact matrix grouped by residue number. But, when I do

    cm <- cmap.pdb(pdb,grpby=pdb$resno)
    

    , the calculation takes forever to finish. If I type only

    cm <- cmap.pdb(pdb)
    

    I am able to create the matrix, but it is not indexed by residue number. But I guess there is no problem with that, since I can get the information I want with the commands you gave. The only thing I still need is to retrieve the residue pair information from the matrix. I've tried to get the matrix index where the values are 1 with;

    cm2 <- cm[asp.inds$atom,arg.inds$atom]
    for(i in 1:length(cm2[,1])){
        for(j in 1:length(cm2[1,])
            if(cm2[i,j]==1){
                print(i)
    }}}
    

    But that gives an error: Error in if(cm2[i,j] == 1){ missing value where TRUE/FALSE needed. Perhaps this is more due to a lack of R syntax than anything else.

    Thanks in advance Fabrício

  3. Xinqiu Yao

    By default, the function cmap() returns the contact map of all residues not all atoms. For example, if you have a pdb with 100 residues, the function returns a 100x100 matrix. However, for each residue contact, the calculation is based on all atom coordinates. That is, internally the function calculates distance of all atom pairs between two residues. Then, if the minimal atomic distance is less than a threshold, the residues are considered in contact (set 1 in the returned matrix); otherwise they are not in contact (set 0).

    In the first command above, you just missed referring to 'atom' in the pdb, i.e. grpby=pdb$atom$resno. But you don't have to do the grouping here because by default the function will take care of it very well.

    To get matrix index where values are 1, you can use which(cm==1) (which gives you a 1-D column-wise indices of the matrix) or which(cm==1, arr.ind=TRUE) (which gives you a 2-D array containing the row- and column-index of contacting residues). Typehelp(which) for more details. Btw, if you work on a sub-matrix, e.g. the cm2 above, the returned indices with the which() command will be based on this sub-matrix. Be careful when mapping sub-matrix indices to residues in a whole-chain pdb.

    Let me know if you have any further question.

    The error showed up because the contact map contains 'NA' values. It is common for a contact map to have NAs e.g. between neighboring residues, to show that these residue pairs are not considered in the calculation. The 'if' statement fails if comparing a 'NA' and a numeric. Thereby, the best way is to use the which() function as mentioned above, which automatically skip NAs.

  4. Fabrício Bracht reporter

    Hi. Thanks for the reply. I noticed that the cmap function gave me contact for some residues that were a bit too far (for my purpose, at least), and also, since what I really want is a contact map that will later give me an idea about the ion bridges between charged residues, I used the command

    cm <- cmap(pdb$xyz,inds='sidechain',grpby=pdb$atom$resno,dcut=3,scut=2)
    

    Then, with a

    which(cm==1,arr.ind=TRUE)
    

    , I checked the residues visually using vmd, and everything seems to be in order, but the list is long and I didn't check them all. Is that command correct? If yes, then the problem is that, as you said, by using the sub-matrix I lose residue number information from the whole-chain pdb. Actually, maybe the problem is that I don't know how to retrieve the residue number from the original matrix (if that is even possible). So, in order to try and solve the problem, I thought about making a for loop that compared the resulting pairs from

    which(cm==1,arr.ind=TRUE)
    

    with the residue numbers from

    spdb <- trim(pdb, 'calpha')
    arg.inds <- atom.select(spdb, resid='ARG')
    asp.inds <- atom.select(spdb, resid='ASP')
    

    but couldn't quite figure out how to do it. In python, I would use something like for i,j in zip(which(cm==1,arr.ind=TRUE)[,1],which(cm==1,arr.ind=TRUE)[,2]: ........etc etc (except that this is not python and which does not exist in python, but there would be alternatives....). My point is, I need to check which pairs given by (which(cm......) are asp-arg pairs. Any ideas on how to do that? Thanks

  5. Xinqiu Yao

    I didn't see any problem from your commands. You can check the results by plot.cmap() function, which maps contact on a 2D plane. You could also check out the function pymol.dccm(), which was originally designed to map 'cross-correlation' onto a 3D structure but you can use it to check contact also (considering contact as a strong positive correlation).

    For the second question, I think one easy way might be to name the matrix with residue numbers first. For example,

    spdb <- trim(pdb, 'calpha')
    arg.inds <- atom.select(spdb, resid='ARG')
    asp.inds <- atom.select(spdb, resid='ASP')
    
    rownames(cm) <- spdb$atom$resno
    colnames(cm) <- spdb$atom$resno
    
    sub.cm <- cm[arg.inds$atom, asp.inds$atom]
    # you can easily see "salt bridges" by printing the sub-matrix if the number of charged residues is not large
    sub.cm
    
    # Or show residue numbers of salt bridges directly
    cm.inds <- which(sub.cm==1, arr.ind=TRUE)
    cbind(rownames(sub.cm)[cm.inds[, 1]], colnames(sub.cm)[cm.inds[, 2]])
    
    # NOTE: you might have to repeat above commands except for using 'sub.cm <- cm[asp.inds$atom, arg.inds$atom]'
    # for the other half of salt-bridge pairs. If your matrix is symmetric (e.g. using cmap(..., mask.lower=FALSE)), 
    # however, your results are complete and no need to repeat.
    
  6. Fabrício Bracht reporter

    Hi. Thanks a lot for the advice, it has worked perfectly. In the end, I've added a

    arg_asp <- unique(cbind(rownames(sub.cm)[cm.inds[, 1]], colnames(sub.cm)[cm.inds[, 2]]))
    

    To remove redundant pairs. The matrix is not symmetric, so I did the same thing using

    sub.cm <- cm[asp.inds$atom, arg.inds$atom]
    

    I have one doubt though. The arg.inds and asp.inds lists contains the listing for the atom numbers for all alpha carbons from arginine and aspartate residues, right? So, they are numbered bu atom number and not by residue number. On the other hand, cm has residue number indexes. How can sub.cm be built by using an atom numbered subgroup on a residue numbered matrix? Thanks

  7. Xinqiu Yao

    Not sure if the unique() is used properly (Please check results carefully). Also, why are there redundant pairs? I thought they should be unique already but maybe I was wrong...

    'arg.inds' etc. are made from the spdb, which contains only c-alpha atoms. Therefore, both indices are residue based (Don't be confused by the name arg.inds$atom, where $atom is just a naming convention to distinguish from the coordinate indices $xyz).

  8. Fabrício Bracht reporter

    There are redundant pairs because the protein is a homotetramer. So, since the protein is a homotetramer, and the four monomers have the same residue numbering, some pairs appear more than once. I didn't separate the monomers for analysis because there are important ionic bonds between monomers that I wish to analyse. Thank you for the explanation on arg.inds. One thing I've noticed though is that if I use the cmap.pdb(pdb,'sidechain') command, rownames(cm) <- spdb$atom$resno and colnames(cm) <- spdb$atom$resno do not work. The error states that 'dimnames' [2] length is not equal to the size of the array. If I remove 'sidechain', it works fine. I would like to restrict the contact map analysis, or at least the final sub matrix, to the contact between the sidechains of the charged residues. I've noticed, after some visual analysis, that the cmap analysis and the final sub-matrix give pairs of residues that are not actually within interacting distance between the side chains, but have some proximity between one side chain and a backbone atom. What I really want is ionic bonds between charged groups. I've tried doing:

    a_protein <- atom.select(pdb,'sidechain')
    spdb <- trim(pdb,inds=a_protein)
    cm <- cmap.pdb(spdb)
    rownames(cm) <- spdb$atom$resno
    

    But the outcome is the same. I also tried loads of different methods. I've tried trimming the pdb to 'cbeta' atoms, tried excluding glycine residues (since they don't have cbeta)....etc. Is there a way to get the contact map and filter it only to the contacts between sidechains and then obtain the same sub-matrix and then index it to residue number like you proposed before? Thanks

  9. Xinqiu Yao

    Most likely the error is because of glycines, which are not in the cmap matrix but still present in 'spdb', causing the mismatch. I think what you've tried is the right way to solve the problem, except for a minor modification:

    a_protein <- atom.select(pdb, 'sidechain')
    cm <- cmap.pdb(pdb, inds=a_protein)
    spdb <- trim(pdb, elety='CB') 
    dim(cm)
    spdb
    # the number of atoms in 'spdb' should match the dimension of 'cm'.
    
    rownames(cm) <- spdb$atom$resno
    

    Let me know if it works.

    Btw, since calculating 'salt bridge' is a common task in structural analysis, I consider contributing a function to do it directly in bio3d. Will keep you posted here.

  10. Fabrício Bracht reporter

    Thanks a lot. It worked. This is how the final data looks like

    > asp_arg
          [,1]  [,2] 
     [1,] "34"  "73" 
     [2,] "155" "158"
     [3,] "100" "187"
     [4,] "23"  "291"
     [5,] "272" "320"
     [6,] "322" "386"
     [7,] "27"  "60" 
     [8,] "23"  "139"
     [9,] "189" "249"
    .....
    
    > arg_asp
          [,1]  [,2] 
     [1,] "60"  "95" 
     [2,] "139" "23" 
     [3,] "60"  "27" 
     [4,] "249" "189"
    ....
    

    Now, as you can see, there are some pairs that are unique to each set, but there are some pairs that are the same, but in reverse order. For example aspartate 189 interacts with arginine 249 and they show up in both lists. I would like to be able to count the number of unique ion pairs. So after some research, I've tried making asp_arg and arg_asp into data frames, and then comparing the values pairwise. But, I'm not good with R (I'm learning, but it takes a while), and I don't know how to compare two sets of different lengths. What I did is the following:

    df.asp_arg <- data.frame(asp_arg)
    df.arg_asp <- data.frame(arg_asp)
    which(outer(df.arg_asp$X2,df.asp_arg$X1,'==') & outer(df.arg_asp$X1,df.arg_asp$X2,'=='),arr.ind=TRUE)
    

    This doesn't work, and I think the problem is that the two sets have different sizes. Also, I am not sure this is the best way to do this. Any suggestions?

  11. Fabrício Bracht reporter

    Hi. Found how to do it. First I reverse the column order of asp_arg

    asp_arg[ , c(1,2)] <- asp_arg[ , c(2,1)]
    

    Then build the data frames, concatenate and remove duplicates

    df.asp.arg <- as.data.frame(asp_arg)
    df.arg.asp <- as.data.frame(arg_asp)
    ion.pairs <- unique(rbind(df.asp.arg,df.arg.asp))
    

    Now all there is left is to do the same for asp-lys, glu-arg and glu-lys.

  12. Log in to comment