How do I correlate distance matrix calculation with actual residue number?

Barry Grant

Thanks for the encouraging comments Byron. Here I will just add a bit more explanation to Xinqiu's reply and also point out the perhaps simpler cmap() contact map function, which you might like to consider.

Typically the dimensions of the distance matrix (or contact map, see example below) will be NxN (where N is the number of atoms you used in the calculation).

It is often more useful for interpretation purposes to output a reduced nxn (residue by residue) matrix. That is, even though we run the calculation on all N atoms we will reduce this to n residues by taking the minimum distance value of the atoms in a given residue. This is the default for dm.pdb() and cmap.pdb() functions as used in Xinqiu's example above. By doing this you can just use the row and column indices to access whatever residue you are after as they will be the same lenght and order as the number of non-water residues in your input structure. Note that you can obtain a similar nxn output from dm.xyz() etc. by using the 'grpby' option. This has the advantage of allowing you to define the atom groups.

The simplest way to return an nxn matrix of course is to run the calculation on just C-alpha atoms (1 atom per protein residue), which has the advantage of being much quicker and will perhaps be sufficient for your needs (see examples below). Note that cmap.pdb() includes ligands as residues. hope this helps!

library(bio3d)

## Read an example online pdb
pdb <- read.pdb("4q21")

##-- Let's use an all-or-nothing (1 or 0) contact map
##-   E.g. 1: All atom contact map 
##    (i.e. any atom of a residue within dcut distance) 
cm.all <- cmap(pdb, dcut=5, scut=0)

## Will be much quicker to use just C-alpha atom positions 
ca.inds <- atom.select(pdb, "calpha")
cm.ca <- cmap(pdb, inds=ca.inds, dcut=5, scut=0)

## here we also exclude sequential (bonded) residues with 'scut' option
cm.ca.exclude <- cmap(pdb, inds=ca.inds, dcut=5, scut=3)

## Quick plot to see the effect of these options
plot.cmap(cm.all, col="blue")
plot.cmap(cm.ca, add=TRUE, col="yellow")
plot.cmap(cm.ca.exclude, add=TRUE)

## Dimensions of these matrices are all equal to the number of non-water input structure residues. This is 
dim(cm.all) 

inds <- atom.select(pdb, "notwater")
length( unique( pdb$atom$resno[inds$atom] ) )

## Can use to label rows and cols for easy access (as in Xinqiu's example) 
res <- unique( pdb$atom$resno[inds$atom] )

If you want to calculate the distances for only a small subset of positions then use atom selections as with calpha above, e.g. residues 10, 17, 58 81 96 100

## Read an example online pdb
pdb <- read.pdb("4q21")

## residues you want to examine (e.g. returned from your other method)
res <- c(10,17,58,81,96)
res_inds <- atom.select(pdb, resno=res, elety="CA")
res_cm   <- cmap(pdb, inds=res_inds, dcut=5, scut=0)

# which of these residues are in contact
res[ which(res_cm==1, arr.ind=T) ]

# check by printing out contact map with labels
colnames(res_cm) <- res
rownames(res_cm) <- res
res_cm

2015-05-11T16:56:35+00:00

Comments (4)

Xinqiu Yao

Hi,

I am not very clear about your question. If it is just mapping distance matrix rows and columns to residue number, it is very easy. Try following commands:

#!
pdb <- read.pdb('your_pdb_file')

# extract residue numbers
resno <- pdb$atom[pdb$calpha, 'resno']

# calculate distance matrix
mat <- dm(pdb, inds=atom.select(pdb, 'protein'), mask.lower=FALSE)

# assign row/col names with resno
rownames(mat) <- resno
colnames(mat) <- resno

# Suppose you want a subset matrix of the distance between residues 70:80  
resno2 <- as.character(c(70:80))
mat2 <- mat[resno2, resno2]

Let me know if it solves your problem.

2015-05-11T14:03:45+00:00

Barry Grant
Thanks for the encouraging comments Byron. Here I will just add a bit more explanation to Xinqiu's reply and also point out the perhaps simpler cmap() contact map function, which you might like to consider.

Typically the dimensions of the distance matrix (or contact map, see example below) will be NxN (where N is the number of atoms you used in the calculation).

It is often more useful for interpretation purposes to output a reduced nxn (residue by residue) matrix. That is, even though we run the calculation on all N atoms we will reduce this to n residues by taking the minimum distance value of the atoms in a given residue. This is the default for dm.pdb() and cmap.pdb() functions as used in Xinqiu's example above. By doing this you can just use the row and column indices to access whatever residue you are after as they will be the same lenght and order as the number of non-water residues in your input structure. Note that you can obtain a similar nxn output from dm.xyz() etc. by using the 'grpby' option. This has the advantage of allowing you to define the atom groups.

The simplest way to return an nxn matrix of course is to run the calculation on just C-alpha atoms (1 atom per protein residue), which has the advantage of being much quicker and will perhaps be sufficient for your needs (see examples below). Note that cmap.pdb() includes ligands as residues. hope this helps!
```
library(bio3d)

## Read an example online pdb
pdb <- read.pdb("4q21")

##-- Let's use an all-or-nothing (1 or 0) contact map
##-   E.g. 1: All atom contact map 
##    (i.e. any atom of a residue within dcut distance) 
cm.all <- cmap(pdb, dcut=5, scut=0)

## Will be much quicker to use just C-alpha atom positions 
ca.inds <- atom.select(pdb, "calpha")
cm.ca <- cmap(pdb, inds=ca.inds, dcut=5, scut=0)

## here we also exclude sequential (bonded) residues with 'scut' option
cm.ca.exclude <- cmap(pdb, inds=ca.inds, dcut=5, scut=3)

## Quick plot to see the effect of these options
plot.cmap(cm.all, col="blue")
plot.cmap(cm.ca, add=TRUE, col="yellow")
plot.cmap(cm.ca.exclude, add=TRUE)

## Dimensions of these matrices are all equal to the number of non-water input structure residues. This is 
dim(cm.all) 

inds <- atom.select(pdb, "notwater")
length( unique( pdb$atom$resno[inds$atom] ) )

## Can use to label rows and cols for easy access (as in Xinqiu's example) 
res <- unique( pdb$atom$resno[inds$atom] )
```
If you want to calculate the distances for only a small subset of positions then use atom selections as with calpha above, e.g. residues 10, 17, 58 81 96 100
```
## Read an example online pdb
pdb <- read.pdb("4q21")

## residues you want to examine (e.g. returned from your other method)
res <- c(10,17,58,81,96)
res_inds <- atom.select(pdb, resno=res, elety="CA")
res_cm   <- cmap(pdb, inds=res_inds, dcut=5, scut=0)

# which of these residues are in contact
res[ which(res_cm==1, arr.ind=T) ]

# check by printing out contact map with labels
colnames(res_cm) <- res
rownames(res_cm) <- res
res_cm
```
- 2015-05-11T16:56:35+00:00
Barry Grant
- marked as minor
- 2015-05-13T01:13:29+00:00
Barry Grant
- changed status to resolved
- 2015-06-17T14:54:45+00:00
Log in to comment