comparative structural analysis --computing distance in ensemble of distinct structures

Issue #181 resolved
Former user created an issue

I am following standard structural analysis section of the PCA tutorial on Bio3d website: http://thegrantlab.org/bio3d/tutorials/principal-component-analysis

My question is that besides RMSD and RMSF is it possible to compute distance or angle for equivalent positions for all structures in the ensemble even though sequence is different?

Specifically, for example what would be an easy way to compute distance between residue 178 and residue 112 of 1TAG and for all other 103 structures in the ensemble at the equivalent position? I am hoping to get a distance vector of length 103 as an output.

Comments (4)

  1. Lars Skjærven

    Jepps. You'll need to dig into the xyz component of your pdbs object. (pdbs is just a multiple sequence alignment together with aligned coordinate data and associated attributes). Thus to fetch the coordinates of individual residues you can access the pdbs$xyz matrix like this:

    # structure 1-5, residue/column 50
    pdbs$xyz[1:5, atom2xyz(50)]
    
    # structure 1-5, residue/column 51
    pdbs$xyz[1:5, atom2xyz(51)]
    

    You can then use dist.xyz(). e.g.:

    dist.xyz(pdbs$xyz[1:5, atom2xyz(50)], pdbs$xyz[1:5, atom2xyz(50)])
    
              [,1]      [,2]       [,3]       [,4]       [,5]
    [1,] 0.0000000 0.2084326 0.17615631 0.14990663 0.17016530
    [2,] 0.2084326 0.0000000 0.23850228 0.25581043 0.25021297
    [3,] 0.1761563 0.2385023 0.00000000 0.08628878 0.04510184
    [4,] 0.1499066 0.2558104 0.08628878 0.00000000 0.04512605
    [5,] 0.1701653 0.2502130 0.04510184 0.04512605 0.00000000
    

    which will give you the pair-wise distances between residues 50 for the chosen structures (here structure 1-5).

    Basically, by understanding how the pdbs$xyz matrix is composed you should be able to do the things you aim for here.

  2. Barry Grant

    Sorry I am a bit late to this thread. Lars's response is excellent. Below I add just a little more detail on how to find the particular residues mentioned in the question (via first looking-up their residue numbers in the original PDB files, which are also stored in the pdbs named object returned from pdbaln() and related functions). I have also added an alternative way that exposes writing your own simple functions to do these sorts of things more easily.

    library(bio3d)
    
    ##- Load the example data 
    data(transducin)
    attach(transducin)
    
    ##-- Find the alignment positions of RESNO 112 and 178
    i.112 <- which(pdbs$resno["1TAG_A",] == 112)
    i.178 <- which(pdbs$resno["1TAG_A",] == 178)
    
    ##- Convert these Calpha 'atom' indices to 'xyz' indices for coordinate access
    i.112.xyz <- atom2xyz(i.112)
    i.178.xyz <- atom2xyz(i.178)
    
    ##- Use 'xyz' indices to access coresponding coordinates
    coords.112 <- pdbs$xyz[,i.112.xyz]
    coords.178 <- pdbs$xyz[,i.178.xyz]
    
    ##- Calculate the requested distances 
    ##  (note the effect of the 'all.pairs=FALSE' option)
    answer <- dist.xyz(coords.112, coords.178, all.pairs=FALSE)
    
    > answer
     [1] 19.52610 19.72753 19.60729 20.76520 20.89739 21.12701 18.53349 20.43393
     [9] 20.99625 19.75095 20.05707 20.29599 20.12525 20.24655 18.82836 19.71828
    [17] 19.80987 20.55976 20.96552 20.20745 20.32955 21.61822 22.89586 18.44744
    [25] 20.81348 20.74089 21.81194 22.22241 21.50130 21.95177 22.15232 21.53123
    [33] 20.00782 21.53460 18.28714 19.26866 20.41942 21.26434 21.90934 21.53309
    [41] 21.70067 21.96786 21.52007 19.76650 21.11249 23.15418 23.73335 22.30739
    [49] 21.78643 23.29267 22.70449 22.91519 21.53917
    

    And a shorter alternative, but perhaps initially harder to read, approach that I include here to demonstrate a general way to calculate values across these types of pdbs objects.

    xyz <- pdbs$xyz[, atom2xyz(which(pdbs$resno["1TAG_A",] %in% c(112,178))) ]
    
    ##- A simple distance function
    mydist <-function(xyz) { dist(matrix(xyz, ncol = 3, byrow = TRUE)) }
    
    ##- Apply the function to each structure (row)
    answer2 <- apply(xyz,1, mydist)
    
    > answer2
      1TND_A   1TND_B   1TND_C   1TAD_A   1TAD_B   1TAD_C   1TAG_A   3V00_C
    19.52610 19.72753 19.60729 20.76520 20.89739 21.12701 18.53349 20.43393
      3V00_B   3V00_A   1FQJ_A   1FQJ_D   1FQK_A   1FQK_C   1GOT_A   2XNS_A
    20.99625 19.75095 20.05707 20.29599 20.12525 20.24655 18.82836 19.71828
      2XNS_B   1KJY_A   1KJY_C   2OM2_A   2OM2_C   4G5Q_A   4G5Q_D   1GP2_A
    19.80987 20.55976 20.96552 20.20745 20.32955 21.61822 22.89586 18.44744
      1AGR_A   1AGR_D   1CIP_A   1GFI_A   1GIA_A   2GTP_B   2ZJY_A   3ONW_A
    20.81348 20.74089 21.81194 22.22241 21.50130 21.95177 22.15232 21.53123
      3ONW_B   1BH2_A   1GG2_A   1GIT_A   3QI2_A   3QI2_B   1SVK_A   1SVS_A
    20.00782 21.53460 18.28714 19.26866 20.41942 21.26434 21.90934 21.53309
      3FFA_A   1GIL_A   1AS0_A   1AS2_A   3QE0_B   2ODE_A   2ODE_C   2V4Z_A
    21.70067 21.96786 21.52007 19.76650 21.11249 23.15418 23.73335 22.30739
      4G5R_A   2IHB_A   4G5O_A   4G5O_D   3C7K_C
    21.78643 23.29267 22.70449 22.91519 21.53917
    
  3. Log in to comment