help with using distance matrix calculation.

Issue #883 new
David Covell created an issue

I was looking for any available R tools for manipulating pdb structures. In my specific case I have collected all pdb structures from the kinome tree that include a ligand and would like to develop the density distributions for protein to ligand atomic distances. The question is to determine if there are preferred atomic distance between protein from the different kinome branches (TK,TKL,CMGC,AGC,STE,CK1,CAMK).

Reading the vignettes from bio3d indicate that somewhere in the extensive tools I could find a way to do this (primarily with dm{bio3d}). However I have not yet found the magic decoder. I will keep on digging. But your help would be appreciated. As an example, 5hie is co-crystallized with Dabrafenib. Obtaining all inter-atomic distances would(might) be sufficient towards developing an answer to my question.. However get.pdb does not identify the HETATM subset for Dabrafenib.

Help would be appreciated.

Thanks

Comments (8)

  1. Xinqiu Yao

    Hi,

    You need to use read.pdb() not get.pdb(). For example,

    library(bio3d)
    pdb <- read.pdb("5hie")
    pdb
    
    ## The output says it has 1023 protein residues and 4 ligands
    
    dmat <- dm(pdb, inds=atom.select(pdb, "water", inverse=TRUE)) # here, I remove water
    dim(dmat)
    # [1] 1027 1027
    # It means the result include the protein and ligands 
    

    Hope it may help.

  2. David Covell reporter

    Xinqiu Yao,

    Thank you for your response. So far I am not quite sure how to get distances for all atoms (labelled as in the parent structure). The dm command seems to not accept the ‘all.atom=T’ qualifier. Note that my goal is to have heavy atom distances for the ligand (P06) to the target (5hie). Please advise me on how this should be done.

    pdb

    Call: read.pdb(file = "./5hie_B.pdb")

    Total Models#: 1 Total Atoms#: 2123, XYZs#: 6369 Chains#: 1 (values: B)

     Protein Atoms#: 2074  (residues/Calpha atoms#: 259)
     Nucleic acid Atoms#: 0  (residues/phosphate atoms#: 0)
    
     Non-protein/nucleic Atoms#: 49  (residues: 15)
     Non-protein/nucleic resid values: [ HOH (14), P06 (1) ]
    

    Protein sequence: DWEIPDGQITVGQRIGSGSFGTVYKGKWHGDVAVKMLTPQQLQAFKNEVGVLRKTRHVNI LLFMGYSTKPQLAIVTQWCEGSSLYHHLHIIETKFEMIKLIDIARQTAQGMDYLHAKSII HRDLKSNNIFLHEDLTVKIGDFGLATVKSRSGSILWMAPEVIRMQDKNPYSFQSDVYAFG IVLYELMTGQLPYSNINNRDQIIFMVGRGYLSPDLSKVRSNCPKA...<cut>...RSLP

    • attr: atom, xyz, helix, sheet, calpha, Call

    dim(dmat) [1] 260 260

    Here the distances appear to be for only the c-alpha atoms on the protein and the ligand(P06) is represented as a single atom. Since there are 35 atoms for P06 and 2074 atoms for 5hie, I am expecting a larger matrix. Obviously, since I am interested only in ligand to target distances, formatting to list only distances within a cutoff will reduce the dimension of dmat. However, with a reduced dimension, accounting for the atom numbering will be important.

    Thank you in advance for your help,

    David

  3. David Covell reporter

    Xinqiu Yao,

    Again, thanks. Specific details are sometimes hard to find. From reading your help on atom.select it is not clear which elements need to be specified In order to create subsets with the same structure of the parent.

    pdb_nowater <- atom.select(pdb,"water",inverse=T) pdb_ligand <- atom.select(pdb,resid='P06') pdb_target <- atom.select(pdb,"protein")

    generate subsets that lack the pdb values for pdb$atom$type pdb$atom$eleno pdb$atom$elety pdb$atom$resid pdb$atom$x pdb$atom$y pdb$atom$z

    Having this information in pdb_ligand and pdb_target would eliminate the need to chase the pdb_ligand$atom numbers back to the original pdb. Note that the following syntax did not work: pdb_ligand <- atom.select(pdb,resid='P06',type=pdb$atom$type,eleno=pdb$atom$eleno)

    S3 method for class 'pdb'

    atom.select(pdb, string = NULL, type = NULL, eleno = NULL, elety = NULL, resid = NULL, chain = NULL, resno = NULL, insert = NULL, segid = NULL, operator = "AND", inverse = FALSE, value = FALSE, verbose=FALSE, ...)

  4. Xinqiu Yao

    By default, atom.select() just creates a list of indexes that allow you to map the corresponding atoms using a PDB object. To return a PDB object itself, you can add value=TRUE. For example, atom.select(pdb, "protein", value=TRUE) will return a PDB rather than index.

    In R, you can always type ?function to get more information, e.g. ?atom.select. Read the entire document and you will find an explanation of each argument. There is also an online version that may help you to explore (http://thegrantlab.org/bio3d/reference/atom.select.html).

    Also, the tutorial will be very helpful (http://thegrantlab.org/bio3d/articles/online/pdb_vignette/Bio3D_pdb.html/), if you haven’t gone through it.

  5. David Covell reporter

    Xinqiu,

    Thanks for your helpful comments. I am returning to the issue of calculating the distribution of nearest neighbor distances for pdb ligand atoms to target atoms. Respectfully, I have read the some of the examples available as vingettes, and find that sorting out the details about atom labels and numbering will be critical. In that regard, and using the examples of 5hie and 6gdm, here is want I find:

    pdb <- read.pdb(“5hie”) gives 1023 calpha atoms with 2 ligands (HOH and P06) dmat <- dm(pdb, inds=atom.select(pdb, "water", inverse=TRUE)) gives a 1027X1027 distance matrix, where the P06 distances are listed for 1024-1027) since I woud like to see all heavy atoms you suggested grp=FALSE, however it is not clear where this command appears. dmat <- dm(pdb,inds=atom.select(pdb,”water”,inverse=TRUE,grp=FALSE) yields and error. Once the ‘all_atom” dmat is done, can labels be assigned according to the xyz coordinates? This would insure that atom-atom distances are as needed. Again, I am only interested in the nearest neighbor ligand (P06) to target distances, and their atom labels. In that regard, maybe there is a way to prune the initial pdb to have only distances < 5A?

    If you can point me to ways of solving these issues, that would be most helpful. Looking into other issues, I would ask for the same answers for 6dgm which has 2 chains and 4 ligands.

    pdb <- read.pdb(“6dgm”) from this I see that there are 2 chains and 4 ligands. Selecting chain A and ligand F3Z: pdb_A <- atom.select(pdb,chain”A”,ligand=”F3Z”) gives me what I need, however; dmat <- dm(pdb_A) Error in UseMethod("dm") : no applicable method for 'dm' applied to an object of class "select"

    Again, thanks for your help. I will continue to look at the vignettes, but your input may make the calculation easier.

    Regards,

    David Covell

  6. David Covell reporter

    Xinqiu,

    Thanks for your helpful comments. I am returning to the issue of calculating the distribution of nearest neighbor distances for pdb ligand atoms to target atoms. Respectfully, I have read the some of the examples available as vingettes, and find that sorting out the details about atom labels and numbering will be critical. In that regard, and using the examples of 5hie and 6gdm, here is want I find:

    pdb <- read.pdb(“5hie”) gives 1023 calpha atoms with 2 ligands (HOH and P06) dmat <- dm(pdb, inds=atom.select(pdb, "water", inverse=TRUE)) gives a 1027X1027 distance matrix, where the P06 distances are listed for 1024-1027) since I woud like to see all heavy atoms you suggested grp=FALSE, however it is not clear where this command appears. dmat <- dm(pdb,inds=atom.select(pdb,”water”,inverse=TRUE,grp=FALSE) yields and error. Once the ‘all_atom” dmat is done, can labels be assigned according to the xyz coordinates? This would insure that atom-atom distances are as needed. Again, I am only interested in the nearest neighbor ligand (P06) to target distances, and their atom labels. In that regard, maybe there is a way to prune the initial pdb to have only distances < 5A?

    If you can point me to ways of solving these issues, that would be most helpful. Looking into other issues, I would ask for the same answers for 6dgm which has 2 chains and 4 ligands.

    pdb <- read.pdb(“6dgm”) from this I see that there are 2 chains and 4 ligands. Selecting chain A and ligand F3Z: pdb_A <- atom.select(pdb,chain”A”,ligand=”F3Z”) gives me what I need, however; dmat <- dm(pdb_A) Error in UseMethod("dm") : no applicable method for 'dm' applied to an object of class "select"

    Again, thanks for your help. I will continue to look at the vignettes, but your input may make the calculation easier.

    Regards,

    David Covell

  7. Xinqiu Yao

    Hi,

    The first question, grp=FALSE should be used for the dm()function, not the inner call of atom.select(). That is, you should use:

    dmat <- dm(pdb,inds=atom.select(pdb,"water",inverse=TRUE),grp=FALSE)

    For the second question, your pdb_A is not a pdb object but just a list of atom index for the selection. To return as a pdb, use

    pdb_A <- atom.select(pdb,chain="A",resid="F3Z", value=TRUE)

    (N.B. There is no option for “ligand=” Use “resid=” for selecting specific residue names. Also, are you sure the ligand is called “F3Z” in the pdb?)

  8. Log in to comment