selecting representative side chain atoms using atom.select
Hi,
I am keen to apply community network analysis on the side-chain atoms of my protein and would like to select the representative elements (Ala:CB, Arg:CZ, Asp:CD..) in the structure for the calculations. Could you please guide me as to how to select distinct atoms using the atom.select function ?
Thank you for your help. Nikita
Comments (24)
-
-
-
reporter Hi,
Thank you very much for your help in this regard. I have another question regarding community analysis of Molecular Dynamics trajectory. I am applying the contact map function cmap with distance cutoff of 10 A. I have selected all the CA atoms and my R script is as follows:
pdb<-read.pdb() dcd<-read.dcd() inds<-atom.select(pdb, elety="CA") trj<-fit.xyz(..) cm <- cmap(trj, dcut = 10, scut = 0, pcut = 0.75, mask.lower = FALSE)
However, at this step the analysis is terminated in R. Could you please guide me if I am applying the cmap function appropriately ?
Thank you again, Nikita
-
Hi,
It is not clear what was going wrong with just the commands you've provided. Could you tell us a bit more about the output after running these commands? What do you mean "terminated in R"? A copy&paste of the error message would be very helpful.
Also, a bit more questions about your data and commands: Does your trajectory contain only CA atoms? How did you call fit.xyz() to get the object trj (the command you provided is vague)?
Thanks!
-
reporter Hi,
Thank you for your reply. R hangs when I pass the cmap function and there is no error message. The trajectory contains only CA atoms. The commands are as follows:
pdb<-read.pdb("--.pdb") dcd<-read.dcd("--.dcd") inds<-atom.select(pdb, "calpha) trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd,fixed.inds = inds$xyz, mobile.inds = inds$xyz) cm <- cmap(trj, dcut = 10, scut = 0, pcut = 0.75, mask.lower = FALSE)
At this step, the program hangs. I am not clear if it is a computation error or if I am applying the function wrong.
Thank you for your time. Nikita
-
Hi,
It could be just taking a longer computation time than you expected. Contact map calculation is indeed time-consuming especially for "large" system and "long" simulation.
Could you check if the function is still running or not? (You can get the information from, e.g. the command 'top' if you are using linux, or some system monitor if you are using other OS).
Also could you check the "size" of your trj object (e.g. with the command
dim(trj)
) and paste the output here? Then I may guess if it will take very long time...Another thing you may try is setting ncore=N in calling the function cmap(), where N is the number of CPU cores to do the calculation. Note that you need to be running on a multi-core computer to gain the speedup.
Let me know what you get!
-
reporter Hi,
Thank you very much for your reply. It is working and I am able to analyze the contact map filter results. It was just taking longer computation time.
Thank you again. Sincerely, Nikita
-
- changed status to resolved
-
reporter Hi,
I apologize for multiple queries. I am selecting the side-chain representative atoms as you suggested: sel1<-atom.select(pdb, resid="ALA", elety="CB") sel2<-atom.select(pdb, resid="VAL", elety="CB") .. inds<-combine.sel(sel19,sel20, op="+") Union of sel1 and sel2 * Selected a total of: 268 atoms trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd,fixed.inds = inds$xyz, mobile.inds = inds$xyz) cij <- dccm(trj) net <- cna(cij, cutoff.cij=0.5)
However, everything is selected instead of the specific atoms. For eg: x$members[1] c(21:22, 544, 574, 807:808, 1324, 1326:1348, 1350:1388, 1411:1418, 1525:1567, 1642:1643, 1647:1649, 1651, 2496:2497)
Therefore, I get communities made of all atoms instead of the 268 atoms selected. I am new to R and would be very grateful for your help in this regard.
Thank you, Nikita
-
Hi,
That is fine for multiple queries.
I assume you have a full (all-heavy-atom) pdb and dcd files. If it is true, in the command:
trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd,fixed.inds = inds$xyz, mobile.inds = inds$xyz)
the returned trj will contain all heavy atoms, too! You need to "trim" it before calling dccm() and cna():
trj2 <- trj[, inds$xyz] cij <- dccm(trj2) net <- cna(cij, cutoff.cij=0.5)
Btw, I may also recommend you install the newest version of bio3d released just recently (v2.2), in case any unexpected bugs exist in old versions:
install.packages('bio3d')
-
- changed status to open
-
Hi Nikita, Note the "Code" button in which you should use in order to format your code. This will make your code much easier to read for us, and understand what you're trying to do. If I understand you correctly, the code below summarizes yours and Xinqiu's code:
# imitate your pdb and dcd object with a multimode PDB pdb <- read.pdb("2RSC", multi=TRUE) sele <- atom.select(pdb, "noh") pdb <- trim.pdb(pdb, sele) dcd <- pdb$xyz pdb$xyz <- pdb$xyz[1,, drop=FALSE] # select representative atoms sel <- list() sel[[1]] <- atom.select(pdb, elety="CA")$atom sel[[2]] <- atom.select(pdb, resid="ALA", elety="CB")$atom sel[[3]] <- atom.select(pdb, resid="VAL", elety="CB")$atom sel[[4]] <- atom.select(pdb, resid="ARG", elety="CZ")$atom # union of selection inds <- sort(Reduce(union, sel)) # fit trajectory to calpha coords (output is all atoms) ca.inds <- atom.select(pdb, "calpha") trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd, fixed.inds = ca.inds$xyz, mobile.inds = ca.inds$xyz) # trim trajectory to selected atoms trj2 <- trj[, atom2xyz(inds)] cij <- dccm.xyz(trj2) net <- cna(cij, cutoff.cij=0.5)
PS - with regards to Bio3D version 2.2: I still get v2.1-3 when fetching from CRAN. i.e. although it's accepted they probably use some time to update to the new version there.
-
Thanks for tidying up!
Regards bio3d on CRAN, it takes some time to synchronize on mirror sites. I usually put following lines in the
~/.Rprofile
to get most updates quickly:local({r <- getOption("repos") r["CRAN"] <- "http://cran.rstudio.com" options(repos = r)})
-
Good idea Xinqiu. I also note that version 2.2-1 is now available from CRAN.
-
- changed status to resolved
-
reporter Hi,
Thank you very much for your help. I am able to select the side-chain specific atoms now with your suggestion for community analysis.
Sincerely, Nikita
-
reporter Hi,
I apologize for multiple queries. I am new to R and greatly appreciate your help. I followed the code as suggested by you to select sidechain specific atoms in my trajectory to run the cna function. My input is a heavy atom file. I am still facing the problem wherein after trimming, all the other atoms are still selected. Please find a detailed code below:
pdb<-read.pdb("Linker_rep1_noh_Stride10.pdb", multi=TRUE) sele<-atom.select(pdb, "noh") Build selection from input string * Selected 2203 non-hydrogen atoms * > pdb<-trim.pdb(pdb,sele) > dcd<-pdb$xyz > pdb$xyz<-pdb$xyz[1,,drop=FALSE] > sel<-list() > sel[[1]] <- atom.select(pdb, resid="ALA",elety="CB")$atom Build selection from input components * Selected a total of: 11 intersecting atoms * > sel[[2]] <- atom.select(pdb, resid="VAL",elety="CB")$atom Build selection from input components * Selected a total of: 18 intersecting atoms * > sel[[3]] <- atom.select(pdb, resid="LEU",elety="CG")$atom Build selection from input components * Selected a total of: 26 intersecting atoms * > sel[[4]] <- atom.select(pdb, resid="ILE",elety="CB")$atom Build selection from input components * Selected a total of: 13 intersecting atoms * > sel[[5]] <- atom.select(pdb, resid="SER",elety="OG")$atom Build selection from input components * Selected a total of: 21 intersecting atoms * > sel[[6]] <- atom.select(pdb, resid="THR",elety="CB")$atom Build selection from input components * Selected a total of: 10 intersecting atoms * > sel[[7]] <- atom.select(pdb, resid="ASP",elety="CG")$atom Build selection from input components * Selected a total of: 14 intersecting atoms * > sel[[8]] <- atom.select(pdb, resid="GLU",elety="CD")$atom Build selection from input components * Selected a total of: 25 intersecting atoms * > sel[[9]] <- atom.select(pdb, resid="ASN",elety="CG")$atom Build selection from input components * Selected a total of: 7 intersecting atoms * > sel[[10]] <- atom.select(pdb, resid="GLN",elety="CD")$atom Build selection from input components * Selected a total of: 10 intersecting atoms * > sel[[11]] <- atom.select(pdb, resid="LYS",elety="CE")$atom Build selection from input components * Selected a total of: 19 intersecting atoms * > sel[[12]] <- atom.select(pdb, resid="ARG",elety="CZ")$atom Build selection from input components * Selected a total of: 13 intersecting atoms * > sel[[13]] <- atom.select(pdb, resid="CYS",elety="SG")$atom Build selection from input components * Selected a total of: 6 intersecting atoms * > sel[[14]] <- atom.select(pdb, resid="MET",elety="SD")$atom Build selection from input components * Selected a total of: 13 intersecting atoms * > sel[[15]] <- atom.select(pdb, resid="GLY",elety="CA")$atom Build selection from input components * Selected a total of: 15 intersecting atoms * > sel[[16]] <- atom.select(pdb, resid="PRO",elety="CG")$atom Build selection from input components * Selected a total of: 8 intersecting atoms * > sel[[17]] <- atom.select(pdb, resid="HSD",elety="NE2")$atom Build selection from input components * Selected a total of: 6 intersecting atoms * > sel[[18]] <- atom.select(pdb, resid="PHE",elety="CZ")$atom Build selection from input components * Selected a total of: 12 intersecting atoms * > sel[[19]] <- atom.select(pdb, resid="TYR",elety="CZ")$atom Build selection from input components * Selected a total of: 15 intersecting atoms * > sel[[20]] <- atom.select(pdb, resid="TRP",elety="CD2")$atom Build selection from input components * Selected a total of: 6 intersecting atoms * > inds <- sort(Reduce(union, sel)) > print(inds) [1] 7 14 20 30 41 48 57 66 74 81 89 96 107 [14] 115 125 133 141 147 152 158 165 176 182 187 194 204 [27] 214 221 229 240 253 259 266 277 286 293 300 305 316 [40] 324 330 341 349 355 361 368 375 382 390 399 410 417 [53] 427 436 443 451 457 466 474 481 489 497 506 514 524 [66] 531 538 547 555 566 573 578 586 591 601 609 620 629 [79] 634 646 653 661 669 678 689 699 705 711 717 723 729 [92] 737 745 756 765 776 785 794 804 815 825 836 845 852 [105] 861 870 878 886 895 904 911 919 926 933 941 948 955 [118] 962 970 981 990 999 1007 1015 1023 1034 1042 1053 1063 1071 [131] 1079 1086 1091 1100 1108 1116 1122 1129 1137 1145 1154 1160 1165 [144] 1172 1182 1188 1196 1202 1213 1219 1225 1233 1242 1253 1261 1269 [157] 1277 1285 1294 1301 1317 1325 1331 1336 1342 1348 1356 1366 1375 [170] 1380 1391 1403 1413 1420 1427 1434 1441 1449 1458 1468 1477 1485 [183] 1495 1503 1509 1517 1524 1530 1537 1550 1559 1568 1574 1579 1587 [196] 1596 1607 1618 1625 1637 1646 1652 1658 1666 1674 1682 1691 1701 [209] 1712 1723 1730 1738 1746 1753 1760 1767 1774 1785 1791 1799 1806 [222] 1812 1818 1829 1837 1848 1860 1869 1878 1885 1892 1898 1905 1915 [235] 1921 1932 1940 1947 1957 1967 1976 1982 1992 2005 2013 2023 2029 [248] 2035 2044 2055 2064 2069 2080 2090 2096 2105 2113 2121 2127 2134 [261] 2143 2151 2158 2167 2174 2183 2192 2203 > ca.inds <- atom.select(pdb, "calpha") Build selection from input string * Selected a total of: 268 intersecting atoms * > trj <- fit.xyz(fixed = pdb$xyz, mobile = dcd, fixed.inds = ca.inds$xyz, mobile.inds = ca.inds$xyz) > trj2 <- trj[, atom2xyz(inds)] > cij <- dccm.xyz(trj2) > net <- cna(cij, cutoff.cij=0.5) Warning message: In cna.dccm(cij, cutoff.cij = 0.5) : The number of communities is larger than the number of unique 'colors' provided as input. Colors will be recycled > x<-summary(net) id size 1 18 2 29 3 1 4 1 5 23 6 1 7 26 8 4 9 1 10 10 11 1 12 1 13 1 14 1 15 1 16 1 17 22 18 1 19 1 20 2 21 12 22 1 23 1 24 1 25 2 26 1 27 1 28 1 29 1 30 1 31 1 32 1 33 9 34 1 35 21 36 1 37 1 38 1 39 1 40 1 41 1 42 1 43 1 44 1 45 1 46 1 47 1 48 1 49 1 50 1 51 1 52 1 53 2 54 1 55 1 56 1 57 1 58 1 59 1 60 1 61 1 62 1 63 1 64 1 65 1 66 1 67 1 68 1 69 1 70 2 71 1 72 1 73 5 74 1 75 1 76 1 77 1 78 1 79 2 80 8 81 1 82 1 83 1 84 1 85 1 86 1 87 1 88 1 > x$members[1] $`1` 1 2 94 97 99 100 102 129 159 160 161 162 163 164 165 166 167 1 2 94 97 99 100 102 129 159 160 161 162 163 164 165 166 167 226 226
I would be very grateful for your suggestions. Thank you, Nikita
-
- changed status to open
-
Nodes are renumbered in the network and so the id '1', '2', '3', ... are corresponding to the 'first', 'second', 'third', ... representative atoms. Simply type 'net' and return: it should print basic information about the network, and the 'NETWORK NODES#:' should be 268 in this case.
Note that if you want to look at your network via e.g. view.dccm(), the lines representing correlations will still go through 'CA' atoms, but this is just a shift in visualization, i.e. your data still represent correlations of specified atoms.
Let me know if you still have any question!
-
Hi Nikita, Why do you say that all atoms are selected? What is the output of
dim(trj)
anddim(trj2)
? Can you also paste the output ofprint(pdb)
before and after thetrim.pdb()
call?How many frames do you have in your dcd or pdb file? e.g. what is the output of
dim(dcd)
?Also, would it not be an idea to include the CA atoms in the analysis? Why do you want only the chosen side chain atoms? To combine it with the CA atoms the line with
Reduce
to:inds <- Reduce(union, sel) ca.inds <- atom.select(pdb, "calpha") inds <- sort(union(inds, ca.inds$atom))
I'm not completely sure where this is going though..
-
reporter Hi, Thank you for your reply. I have done community analysis on CA separately and would like to do community analysis on the side-chains individually to determine the residues whose backbone and side-chains lie in different communities.
dim(trj) > dim(trj) [1] 570 6609 > dim(trj2) [1] 570 1518 > net Call: cna.dccm(cij = cij, cutoff.cij = 0.5) Structure: - NETWORK NODES#: 506 EDGES#: 4055 - COMMUNITY NODES#: 34 EDGES#: 31 + attr: network, communities, community.network, community.cij, cij, call
Thank you very much. Sincerely, Nikita
-
Hey,
That is so weird... I saw from above that the length of inds is indeed 268 (Or, did you cut the output of the command 'print(inds)' above?), which means trj[, atom2xyz(inds)] should give you a matrix with 3x268=804 columns. But you actually got 1518 columns?!
If you don't mind, could you send me your file "Linker_rep1_noh_Stride10.pdb" (keep only a couple frames please) and then I could try to reproduce your problems (xinqyao@umich.edu).
-
reporter Hi,
Thank you and I apologize in the delay in replying back. I actually figured out the error I was doing during the data analysis. I am including ca atoms in the side-chain. Also, the numbering of the community members is according to the indices.
Thank you again for your time. Nikita
-
- changed status to resolved
- Log in to comment
Hi,
I guess you need to select for each amino acid one by one and finally combine them. For example,
Making a vector to map each amino acid to representative atom name and do individual select in a loop can be a good idea. It shouldn't be very difficult.
Let me know if you have more questions.