selecting representative side chain atoms using atom.select

Issue #188 resolved
nikita chopra created an issue

Hi,

I am keen to apply community network analysis on the side-chain atoms of my protein and would like to select the representative elements (Ala:CB, Arg:CZ, Asp:CD..) in the structure for the calculations. Could you please guide me as to how to select distinct atoms using the atom.select function ?

Thank you for your help. Nikita

Comments (24)

  1. Xinqiu Yao

    Hi,

    I guess you need to select for each amino acid one by one and finally combine them. For example,

    sel1 <- atom.select(pdb, resid="ALA", elety="CB")
    sel2 <- atom.select(pdb, resid="ARG", elety="CZ")
    sel3 <- atom.select(pdb, resid="ASP", elety="CD")
    ...
    
    sel <- combine.select(sel1, sel2, sel3, ..., operator="+")
    

    Making a vector to map each amino acid to representative atom name and do individual select in a loop can be a good idea. It shouldn't be very difficult.

    Let me know if you have more questions.

  2. nikita chopra reporter

    Hi,

    Thank you very much for your help in this regard. I have another question regarding community analysis of Molecular Dynamics trajectory. I am applying the contact map function cmap with distance cutoff of 10 A. I have selected all the CA atoms and my R script is as follows:

    pdb<-read.pdb() dcd<-read.dcd() inds<-atom.select(pdb, elety="CA") trj<-fit.xyz(..) cm <- cmap(trj, dcut = 10, scut = 0, pcut = 0.75, mask.lower = FALSE)

    However, at this step the analysis is terminated in R. Could you please guide me if I am applying the cmap function appropriately ?

    Thank you again, Nikita

  3. Xinqiu Yao

    Hi,

    It is not clear what was going wrong with just the commands you've provided. Could you tell us a bit more about the output after running these commands? What do you mean "terminated in R"? A copy&paste of the error message would be very helpful.

    Also, a bit more questions about your data and commands: Does your trajectory contain only CA atoms? How did you call fit.xyz() to get the object trj (the command you provided is vague)?

    Thanks!

  4. nikita chopra reporter

    Hi,

    Thank you for your reply. R hangs when I pass the cmap function and there is no error message. The trajectory contains only CA atoms. The commands are as follows:

    pdb<-read.pdb("--.pdb") dcd<-read.dcd("--.dcd") inds<-atom.select(pdb, "calpha) trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd,fixed.inds = inds$xyz, mobile.inds = inds$xyz) cm <- cmap(trj, dcut = 10, scut = 0, pcut = 0.75, mask.lower = FALSE)

    At this step, the program hangs. I am not clear if it is a computation error or if I am applying the function wrong.

    Thank you for your time. Nikita

  5. Xinqiu Yao

    Hi,

    It could be just taking a longer computation time than you expected. Contact map calculation is indeed time-consuming especially for "large" system and "long" simulation.

    Could you check if the function is still running or not? (You can get the information from, e.g. the command 'top' if you are using linux, or some system monitor if you are using other OS).

    Also could you check the "size" of your trj object (e.g. with the command dim(trj)) and paste the output here? Then I may guess if it will take very long time...

    Another thing you may try is setting ncore=N in calling the function cmap(), where N is the number of CPU cores to do the calculation. Note that you need to be running on a multi-core computer to gain the speedup.

    Let me know what you get!

  6. nikita chopra reporter

    Hi,

    Thank you very much for your reply. It is working and I am able to analyze the contact map filter results. It was just taking longer computation time.

    Thank you again. Sincerely, Nikita

  7. nikita chopra reporter

    Hi,

    I apologize for multiple queries. I am selecting the side-chain representative atoms as you suggested: sel1<-atom.select(pdb, resid="ALA", elety="CB") sel2<-atom.select(pdb, resid="VAL", elety="CB") .. inds<-combine.sel(sel19,sel20, op="+") Union of sel1 and sel2 * Selected a total of: 268 atoms trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd,fixed.inds = inds$xyz, mobile.inds = inds$xyz) cij <- dccm(trj) net <- cna(cij, cutoff.cij=0.5)

    However, everything is selected instead of the specific atoms. For eg: x$members[1] c(21:22, 544, 574, 807:808, 1324, 1326:1348, 1350:1388, 1411:1418, 1525:1567, 1642:1643, 1647:1649, 1651, 2496:2497)

    Therefore, I get communities made of all atoms instead of the 268 atoms selected. I am new to R and would be very grateful for your help in this regard.

    Thank you, Nikita

  8. Xinqiu Yao

    Hi,

    That is fine for multiple queries.

    I assume you have a full (all-heavy-atom) pdb and dcd files. If it is true, in the command:

    trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd,fixed.inds = inds$xyz, mobile.inds = inds$xyz)
    

    the returned trj will contain all heavy atoms, too! You need to "trim" it before calling dccm() and cna():

    trj2 <- trj[, inds$xyz]
    cij <- dccm(trj2)
    net <- cna(cij, cutoff.cij=0.5)
    

    Btw, I may also recommend you install the newest version of bio3d released just recently (v2.2), in case any unexpected bugs exist in old versions:

    install.packages('bio3d')
    
  9. Lars Skjærven

    Hi Nikita, Note the "Code" button in which you should use in order to format your code. This will make your code much easier to read for us, and understand what you're trying to do. If I understand you correctly, the code below summarizes yours and Xinqiu's code:

    # imitate your pdb and dcd object with a multimode PDB
    pdb <- read.pdb("2RSC", multi=TRUE)
    sele <- atom.select(pdb, "noh")
    pdb <- trim.pdb(pdb, sele)
    
    dcd <- pdb$xyz
    pdb$xyz <- pdb$xyz[1,, drop=FALSE]
    
    # select representative atoms
    sel <- list()
    sel[[1]] <- atom.select(pdb, elety="CA")$atom
    sel[[2]] <- atom.select(pdb, resid="ALA", elety="CB")$atom
    sel[[3]] <- atom.select(pdb, resid="VAL", elety="CB")$atom
    sel[[4]] <- atom.select(pdb, resid="ARG", elety="CZ")$atom
    
    # union of selection
    inds <- sort(Reduce(union, sel))
    
    # fit trajectory to calpha coords (output is all atoms)
    ca.inds <- atom.select(pdb, "calpha")
    trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd, fixed.inds = ca.inds$xyz, mobile.inds = ca.inds$xyz)
    
    # trim trajectory to selected atoms
    trj2 <- trj[, atom2xyz(inds)]
    
    cij <- dccm.xyz(trj2)
    net <- cna(cij, cutoff.cij=0.5)
    

    PS - with regards to Bio3D version 2.2: I still get v2.1-3 when fetching from CRAN. i.e. although it's accepted they probably use some time to update to the new version there.

  10. Xinqiu Yao

    Thanks for tidying up!

    Regards bio3d on CRAN, it takes some time to synchronize on mirror sites. I usually put following lines in the ~/.Rprofile to get most updates quickly:

    local({r <- getOption("repos")
           r["CRAN"] <- "http://cran.rstudio.com"
           options(repos = r)})
    
  11. nikita chopra reporter

    Hi,

    Thank you very much for your help. I am able to select the side-chain specific atoms now with your suggestion for community analysis.

    Sincerely, Nikita

  12. nikita chopra reporter

    Hi,

    I apologize for multiple queries. I am new to R and greatly appreciate your help. I followed the code as suggested by you to select sidechain specific atoms in my trajectory to run the cna function. My input is a heavy atom file. I am still facing the problem wherein after trimming, all the other atoms are still selected. Please find a detailed code below:

    pdb<-read.pdb("Linker_rep1_noh_Stride10.pdb", multi=TRUE)
    sele<-atom.select(pdb, "noh")
    Build selection from input string
     *  Selected 2203 non-hydrogen atoms *
    > pdb<-trim.pdb(pdb,sele)
    > dcd<-pdb$xyz
    > pdb$xyz<-pdb$xyz[1,,drop=FALSE]
    > sel<-list()
    > sel[[1]] <- atom.select(pdb, resid="ALA",elety="CB")$atom
     Build selection from input components
     *  Selected a total of: 11 intersecting atoms  *
    > sel[[2]] <- atom.select(pdb, resid="VAL",elety="CB")$atom
    
     Build selection from input components
     *  Selected a total of: 18 intersecting atoms  *
    > sel[[3]] <- atom.select(pdb, resid="LEU",elety="CG")$atom
    
     Build selection from input components
     *  Selected a total of: 26 intersecting atoms  *
    > sel[[4]] <- atom.select(pdb, resid="ILE",elety="CB")$atom
    
     Build selection from input components
     *  Selected a total of: 13 intersecting atoms  *
    > sel[[5]] <- atom.select(pdb, resid="SER",elety="OG")$atom
    
     Build selection from input components
     *  Selected a total of: 21 intersecting atoms  *
    > sel[[6]] <- atom.select(pdb, resid="THR",elety="CB")$atom
    
     Build selection from input components
     *  Selected a total of: 10 intersecting atoms  *
    > sel[[7]] <- atom.select(pdb, resid="ASP",elety="CG")$atom
    
     Build selection from input components
     *  Selected a total of: 14 intersecting atoms  *
    > sel[[8]] <- atom.select(pdb, resid="GLU",elety="CD")$atom
    
     Build selection from input components
     *  Selected a total of: 25 intersecting atoms  *
    > sel[[9]] <- atom.select(pdb, resid="ASN",elety="CG")$atom
    
     Build selection from input components
     *  Selected a total of: 7 intersecting atoms  *
    > sel[[10]] <- atom.select(pdb, resid="GLN",elety="CD")$atom
    
     Build selection from input components
     *  Selected a total of: 10 intersecting atoms  *
    > sel[[11]] <- atom.select(pdb, resid="LYS",elety="CE")$atom
    
     Build selection from input components
     *  Selected a total of: 19 intersecting atoms  *
    > sel[[12]] <- atom.select(pdb, resid="ARG",elety="CZ")$atom
    
     Build selection from input components
     *  Selected a total of: 13 intersecting atoms  *
    > sel[[13]] <- atom.select(pdb, resid="CYS",elety="SG")$atom
    
     Build selection from input components
     *  Selected a total of: 6 intersecting atoms  *
    > sel[[14]] <- atom.select(pdb, resid="MET",elety="SD")$atom
    
     Build selection from input components
     *  Selected a total of: 13 intersecting atoms  *
    > sel[[15]] <- atom.select(pdb, resid="GLY",elety="CA")$atom
    
     Build selection from input components
     *  Selected a total of: 15 intersecting atoms  *
    > sel[[16]] <- atom.select(pdb, resid="PRO",elety="CG")$atom
    
     Build selection from input components
     *  Selected a total of: 8 intersecting atoms  *
    > sel[[17]] <- atom.select(pdb, resid="HSD",elety="NE2")$atom
    
     Build selection from input components
     *  Selected a total of: 6 intersecting atoms  *
    > sel[[18]] <- atom.select(pdb, resid="PHE",elety="CZ")$atom
    
     Build selection from input components
     *  Selected a total of: 12 intersecting atoms  *
    > sel[[19]] <- atom.select(pdb, resid="TYR",elety="CZ")$atom
    
     Build selection from input components
     *  Selected a total of: 15 intersecting atoms  *
    > sel[[20]] <- atom.select(pdb, resid="TRP",elety="CD2")$atom
    
     Build selection from input components
     *  Selected a total of: 6 intersecting atoms  *
    > inds <- sort(Reduce(union, sel))
    > print(inds)
      [1]    7   14   20   30   41   48   57   66   74   81   89   96  107
     [14]  115  125  133  141  147  152  158  165  176  182  187  194  204
     [27]  214  221  229  240  253  259  266  277  286  293  300  305  316
     [40]  324  330  341  349  355  361  368  375  382  390  399  410  417
     [53]  427  436  443  451  457  466  474  481  489  497  506  514  524
     [66]  531  538  547  555  566  573  578  586  591  601  609  620  629
     [79]  634  646  653  661  669  678  689  699  705  711  717  723  729
     [92]  737  745  756  765  776  785  794  804  815  825  836  845  852
    [105]  861  870  878  886  895  904  911  919  926  933  941  948  955
    [118]  962  970  981  990  999 1007 1015 1023 1034 1042 1053 1063 1071
    [131] 1079 1086 1091 1100 1108 1116 1122 1129 1137 1145 1154 1160 1165
    [144] 1172 1182 1188 1196 1202 1213 1219 1225 1233 1242 1253 1261 1269
    [157] 1277 1285 1294 1301 1317 1325 1331 1336 1342 1348 1356 1366 1375
    [170] 1380 1391 1403 1413 1420 1427 1434 1441 1449 1458 1468 1477 1485
    [183] 1495 1503 1509 1517 1524 1530 1537 1550 1559 1568 1574 1579 1587
    [196] 1596 1607 1618 1625 1637 1646 1652 1658 1666 1674 1682 1691 1701
    [209] 1712 1723 1730 1738 1746 1753 1760 1767 1774 1785 1791 1799 1806
    [222] 1812 1818 1829 1837 1848 1860 1869 1878 1885 1892 1898 1905 1915
    [235] 1921 1932 1940 1947 1957 1967 1976 1982 1992 2005 2013 2023 2029
    [248] 2035 2044 2055 2064 2069 2080 2090 2096 2105 2113 2121 2127 2134
    [261] 2143 2151 2158 2167 2174 2183 2192 2203
    > ca.inds <- atom.select(pdb, "calpha")
    
     Build selection from input string
     *  Selected a total of: 268 intersecting atoms  *
    > trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd, fixed.inds = ca.inds$xyz, mobile.inds = ca.inds$xyz)
    > trj2 <- trj[, atom2xyz(inds)]
    > cij <- dccm.xyz(trj2)
    > net <- cna(cij, cutoff.cij=0.5)
    Warning message:
    In cna.dccm(cij, cutoff.cij = 0.5) :
      The number of communities is larger than the number of unique 
                  'colors' provided as input. Colors will be recycled
    > x<-summary(net)
     id size
      1   18
      2   29
      3    1
      4    1
      5   23
      6    1
      7   26
      8    4
      9    1
     10   10
     11    1
     12    1
     13    1
     14    1
     15    1
     16    1
     17   22
     18    1
     19    1
     20    2
     21   12
     22    1
     23    1
     24    1
     25    2
     26    1
     27    1
     28    1
     29    1
     30    1
     31    1
     32    1
     33    9
     34    1
     35   21
     36    1
     37    1
     38    1
     39    1
     40    1
     41    1
     42    1
     43    1
     44    1
     45    1
     46    1
     47    1
     48    1
     49    1
     50    1
     51    1
     52    1
     53    2
     54    1
     55    1
     56    1
     57    1
     58    1
     59    1
     60    1
     61    1
     62    1
     63    1
     64    1
     65    1
     66    1
     67    1
     68    1
     69    1
     70    2
     71    1
     72    1
     73    5
     74    1
     75    1
     76    1
     77    1
     78    1
     79    2
     80    8
     81    1
     82    1
     83    1
     84    1
     85    1
     86    1
     87    1
     88    1
    > x$members[1]
    $`1`
      1   2  94  97  99 100 102 129 159 160 161 162 163 164 165 166 167 
      1   2  94  97  99 100 102 129 159 160 161 162 163 164 165 166 167 
    226 
    226 
    

    I would be very grateful for your suggestions. Thank you, Nikita

  13. Xinqiu Yao

    Nodes are renumbered in the network and so the id '1', '2', '3', ... are corresponding to the 'first', 'second', 'third', ... representative atoms. Simply type 'net' and return: it should print basic information about the network, and the 'NETWORK NODES#:' should be 268 in this case.

    Note that if you want to look at your network via e.g. view.dccm(), the lines representing correlations will still go through 'CA' atoms, but this is just a shift in visualization, i.e. your data still represent correlations of specified atoms.

    Let me know if you still have any question!

  14. Lars Skjærven

    Hi Nikita, Why do you say that all atoms are selected? What is the output of dim(trj) and dim(trj2) ? Can you also paste the output of print(pdb) before and after the trim.pdb() call?

    How many frames do you have in your dcd or pdb file? e.g. what is the output of dim(dcd) ?

    Also, would it not be an idea to include the CA atoms in the analysis? Why do you want only the chosen side chain atoms? To combine it with the CA atoms the line with Reduce to:

    inds <- Reduce(union, sel)
    ca.inds <- atom.select(pdb, "calpha")
    inds <- sort(union(inds, ca.inds$atom))
    

    I'm not completely sure where this is going though..

  15. nikita chopra reporter

    Hi, Thank you for your reply. I have done community analysis on CA separately and would like to do community analysis on the side-chains individually to determine the residues whose backbone and side-chains lie in different communities.

    dim(trj) 
    > dim(trj)
    [1]  570 6609
    
    > dim(trj2)
    [1]  570 1518
    
    > net
    
    Call:
      cna.dccm(cij = cij, cutoff.cij = 0.5)
    
    Structure: 
     - NETWORK NODES#:   506    EDGES#: 4055 
     - COMMUNITY NODES#: 34     EDGES#: 31 
    
     + attr: network, communities, community.network,
            community.cij, cij, call
    

    Thank you very much. Sincerely, Nikita

  16. Xinqiu Yao

    Hey,

    That is so weird... I saw from above that the length of inds is indeed 268 (Or, did you cut the output of the command 'print(inds)' above?), which means trj[, atom2xyz(inds)] should give you a matrix with 3x268=804 columns. But you actually got 1518 columns?!

    If you don't mind, could you send me your file "Linker_rep1_noh_Stride10.pdb" (keep only a couple frames please) and then I could try to reproduce your problems (xinqyao@umich.edu).

  17. nikita chopra reporter

    Hi,

    Thank you and I apologize in the delay in replying back. I actually figured out the error I was doing during the data analysis. I am including ca atoms in the side-chain. Also, the numbering of the community members is according to the indices.

    Thank you again for your time. Nikita

  18. Log in to comment