How do I correlate distance matrix calculation with actual residue number?

Issue #228 resolved
Byron DeLaBarre created an issue

First off thanks for creating this R package - I see tremendous value in bringing more protein tools into the R environment & appreciate greatly what you have done.

Now my question:

I am selecting residues within a protein structure based on a property that I calculate with another algorithm. The algorithm has noise in it, so I would like to be able to filter out residues that aren't near other residues that have been flagged by my algorithm. In other words, I am looking for connected groups of residues.

I was hoping your distance matrix function could help me with this, but I don't see how to correlate the matrix with actual residues.

Ideally I would like to use it like this to have it serve as a noise filter for my algorithm:

  1. calculate value for residues with my algorithm
  2. select those residues which have the values I am looking for
  3. run distance matrix on those selected values
  4. identify only residues which are within ~3-5 A of other residues having the value I am looking for 4a/ Alternatively (and equally): eliminate residues with the values I want that don't have nearby residues with the values I want. In other words - get rid of the 'singleton' residues.

Thanks for help on the distance matrix or any alternative method you might suggest with Bio3D.

Comments (4)

  1. Xinqiu Yao

    Hi,

    I am not very clear about your question. If it is just mapping distance matrix rows and columns to residue number, it is very easy. Try following commands:

    #!
    pdb <- read.pdb('your_pdb_file')
    
    # extract residue numbers
    resno <- pdb$atom[pdb$calpha, 'resno']
    
    # calculate distance matrix
    mat <- dm(pdb, inds=atom.select(pdb, 'protein'), mask.lower=FALSE)
    
    # assign row/col names with resno
    rownames(mat) <- resno
    colnames(mat) <- resno
    
    # Suppose you want a subset matrix of the distance between residues 70:80  
    resno2 <- as.character(c(70:80))
    mat2 <- mat[resno2, resno2]
    

    Let me know if it solves your problem.

  2. Barry Grant

    Thanks for the encouraging comments Byron. Here I will just add a bit more explanation to Xinqiu's reply and also point out the perhaps simpler cmap() contact map function, which you might like to consider.

    Typically the dimensions of the distance matrix (or contact map, see example below) will be NxN (where N is the number of atoms you used in the calculation).

    It is often more useful for interpretation purposes to output a reduced nxn (residue by residue) matrix. That is, even though we run the calculation on all N atoms we will reduce this to n residues by taking the minimum distance value of the atoms in a given residue. This is the default for dm.pdb() and cmap.pdb() functions as used in Xinqiu's example above. By doing this you can just use the row and column indices to access whatever residue you are after as they will be the same lenght and order as the number of non-water residues in your input structure. Note that you can obtain a similar nxn output from dm.xyz() etc. by using the 'grpby' option. This has the advantage of allowing you to define the atom groups.

    The simplest way to return an nxn matrix of course is to run the calculation on just C-alpha atoms (1 atom per protein residue), which has the advantage of being much quicker and will perhaps be sufficient for your needs (see examples below). Note that cmap.pdb() includes ligands as residues. hope this helps!

    library(bio3d)
    
    ## Read an example online pdb
    pdb <- read.pdb("4q21")
    
    ##-- Let's use an all-or-nothing (1 or 0) contact map
    ##-   E.g. 1: All atom contact map 
    ##    (i.e. any atom of a residue within dcut distance) 
    cm.all <- cmap(pdb, dcut=5, scut=0)
    
    ## Will be much quicker to use just C-alpha atom positions 
    ca.inds <- atom.select(pdb, "calpha")
    cm.ca <- cmap(pdb, inds=ca.inds, dcut=5, scut=0)
    
    ## here we also exclude sequential (bonded) residues with 'scut' option
    cm.ca.exclude <- cmap(pdb, inds=ca.inds, dcut=5, scut=3)
    
    ## Quick plot to see the effect of these options
    plot.cmap(cm.all, col="blue")
    plot.cmap(cm.ca, add=TRUE, col="yellow")
    plot.cmap(cm.ca.exclude, add=TRUE)
    
    ## Dimensions of these matrices are all equal to the number of non-water input structure residues. This is 
    dim(cm.all) 
    
    inds <- atom.select(pdb, "notwater")
    length( unique( pdb$atom$resno[inds$atom] ) )
    
    ## Can use to label rows and cols for easy access (as in Xinqiu's example) 
    res <- unique( pdb$atom$resno[inds$atom] )
    

    If you want to calculate the distances for only a small subset of positions then use atom selections as with calpha above, e.g. residues 10, 17, 58 81 96 100

    ## Read an example online pdb
    pdb <- read.pdb("4q21")
    
    ## residues you want to examine (e.g. returned from your other method)
    res <- c(10,17,58,81,96)
    res_inds <- atom.select(pdb, resno=res, elety="CA")
    res_cm   <- cmap(pdb, inds=res_inds, dcut=5, scut=0)
    
    # which of these residues are in contact
    res[ which(res_cm==1, arr.ind=T) ]
    
    # check by printing out contact map with labels
    colnames(res_cm) <- res
    rownames(res_cm) <- res
    res_cm
    
  3. Log in to comment