- edited description
How to extract PDB file
Hi guys, My question is is from a tutorial "Beginning with trajectory analysis" and i am dealing with MD trajectory .dcd file. Once I plot all my conformers on PC planes, the next step is to cluster them in to 2 groups using
hc <- hclust(dist(pc$z[,1:2]))
grps <- cutree(hc, k=2)
Here I got two groups with different colors, now I want to extract a PDB file from group 1 and group 2 individually. Is that possible? Sorry if my question is not that good.
Comments (11)
-
-
Your
grps
variable is a vector of the same length as the trajectory containing 1's and 2' assigning each frame to either group 1 or 2. All indices for frames in group 1 can be found withinds <- which(grps == 1)
. Use this to get the coordinates from your trajectory withxyz[inds[1], ]
. You can then write these to a pdb object:inds <- which(grps == 1) crd <- xyz[inds[1], ] write.pdb(xyz=crd, file="test.pdb")
Also note that
write.pdb
takes plenty of input arguments, e.g. to assign residue names and numbers. See?write.pdb
for more information. Hope it helps ! -
-
reporter Thanks @Lars, I tried it and got a big pdb file, probably containing the Ca-atoms for all the conformers from the grp 1. I opened it in VMD but looks wired with a scattered atoms. Yup i read write.pdb that require too many argument to be accurate. I was looking to obtain a single pdb file to represent the whole cluster.
-
You will be writing C-alpha only atoms here perhaps so try using the "Tube" representation in VMD for example.
If you want a singe output structure that in some way represents a given cluster you could determine the closest to the mean conformation for that cluster. The functions you might consider using for this include mean() via the apply() function, rmsd(), which.min() and of course write.pdb().
Let us know if you would like help with this - but I do encourage you to try it yourself also ;-)
-
reporter @bjgrant Sir, I have tried the above options but couldn't implement them well. I am working with bio3d since last one year, during this time i was only dependent on straight forward commands mentioned in tutorials. Although they really worked out well for our two separate data enough good to be submitted in to Nature and Plos quality journals. Being a pure biological structure candidate at times I feel almost helpless even to understand the mathematical languages and their applications for such programs. I will try my best to dig these functions you suggested.
-
Hi Asfaq, Learning scripting / programming might be a bit time consuming, but once you get the hold of it the opportunities in Bio3D certainly gets larger. There are plenty of online resources to learn a bit more about R programming. Try e.g. http://tryr.codeschool.com/ or the following book http://www.nostarch.com/artofr.htm
If my example above gave you multiple frames you did something wrong. Did you remember to access only the first index (
inds[1]
) ? Also, does your trajectory contain only CA atoms, or do you have all atoms in the xyz matrix? if so, useatom.select
andtrim.xyz
to work only with the CA atoms.Here is a complete running example on clustering to obtain representative structures:
# read all atom trajectory xyz <- read.dcd("mydcdfile") # read pdb file containing the same number (and ordering) as the dcd file pdb <- read.pdb("mypdbfile") ca.inds <- atom.select(pdb, "calpha") # trim PDB and XYZ to CA atoms only pdb <- trim(pdb, ca.inds) xyz <- trim(xyz, col.inds=ca.inds) # clustering on RMSD rd <- rmsd(xyz, fit=TRUE) hc <- hclust(as.dist(rd)) grps <- cutree(hc, k=3) # find representatives unq <- unique(grps) all.inds <- seq(1, length(grps)) rep.inds <- rep(NA, length(unq)) for(i in 1:length(unq)) { grp <- unq[i] inds <- which(grps==grp) if(length(inds) > 1) { tmp <- rd[inds, inds] m <- rowMeans(tmp) rep.ind <- which.min(m) rep.ind <- all.inds[inds][rep.ind] } else { rep.ind <- inds[1] } rep.inds[i] <- rep.ind } # indices of cluser representatives rep.inds write.pdb(pdb, xyz=xyz[rep.inds[1], ], file="rep_clust1.pdb") write.pdb(pdb, xyz=xyz[rep.inds[2], ], file="rep_clust2.pdb") write.pdb(pdb, xyz=xyz[rep.inds[3], ], file="rep_clust3.pdb")
-
reporter @Lars. I am trying to trim it but my bio3d package can not recognize this function. You can see below for details;
#!> pdb <- trim(pdb, ca.inds) Error: could not find function "trim" > xyz <- trim(xyz, col.inds=ca.inds) Error: could not find function "trim" > trim.pdb function (pdb, inds = NULL, sse = TRUE) { if (class(pdb) != "pdb") stop("input 'pdb' must be a list object as returned from 'read.pdb'") if (is.null(inds)) stop("no selection indices provided") if (!is.list(inds)) stop("selection indices must be provided i.e. from 'atom.select'") if (is.null(inds$atom) || is.null(inds$xyz)) stop("selection indices must be provided i.e. from 'atom.select'") atom <- pdb$atom[inds$atom, ] xyz <- pdb$xyz[inds$xyz] calpha <- as.logical(atom[, "elety"] == "CA") if (!is.null(pdb$xyz.models)) xyz.models <- pdb$xyz.models[, inds$xyz] else xyz.models <- NULL sse.unbound <- function(sse) { ex <- NULL ch <- NULL ty <- NULL for (i in 1:length(sse$start)) { tmp <- sse$start[i]:sse$end[i] ex <- c(ex, tmp) ch <- c(ch, rep(sse$chain[i], length(tmp))) ty <- c(ty, rep(sse[[4]][i], length(tmp))) } ub <- cbind(ex, ch, ty) ub <- cbind(ub, apply(ub, 1, function(x) paste(x[1:2], collapse = "_"))) colnames(ub) <- c("nums", "chains", "type", "strid") return(ub) } bounds.inds <- function(nums) { if (length(nums) == 1) { return(1) } bounds.inds <- 1 nums.start <- nums[1] diff.i <- 1 for (i in 2:length(nums)) { if ((nums[i] - diff.i) != nums.start) { bounds.inds <- c(bounds.inds, i) nums.start <- nums[i] diff.i <- 1 } else { diff.i <- diff.i + 1 } } return(bounds.inds) } helix <- NULL sheet <- NULL if (sse) { trimmed <- atom[calpha, c("resno", "chain")] if (!is.matrix(trimmed)) trimmed <- t(as.matrix(trimmed)) strid <- apply(trimmed, 1, function(x) paste(x, collapse = "_")) trimmed <- cbind(trimmed, strid) colnames(trimmed) <- c("nums", "chains", "strid") if (length(pdb$helix$start) > 0) { tmp <- sse.unbound(pdb$helix) ht <- tmp[tmp[, "strid"] %in% trimmed[, "strid"], 1:3] new.helix <- matrix(ht, ncol = 3, byrow = FALSE) colnames(new.helix) <- c("nums", "chains", "type") if (nrow(new.helix) > 0) { nums <- as.numeric(new.helix[, "nums"]) bnds <- bounds(nums, dup.inds = FALSE, pre.sort = FALSE) tmp.inds <- bounds.inds(nums) chain <- new.helix[tmp.inds, "chains"] type <- new.helix[tmp.inds, "type"] bnds <- cbind(bnds, chain, type) helix$start = as.numeric(bnds[, "start"]) helix$end = as.numeric(bnds[, "end"]) helix$chain = as.vector(bnds[, "chain"]) helix$type = as.vector(bnds[, "type"]) } } if (length(pdb$sheet$start) > 0) { tmp <- sse.unbound(pdb$sheet) ht <- tmp[tmp[, "strid"] %in% trimmed[, "strid"], 1:3] new.sheet <- matrix(ht, ncol = 3, byrow = FALSE) colnames(new.sheet) <- c("nums", "chains", "type") if (nrow(new.sheet) > 0) { nums <- as.numeric(new.sheet[, "nums"]) bnds <- bounds(nums, dup.inds = FALSE, pre.sort = FALSE) tmp.inds <- bounds.inds(nums) chain <- new.sheet[tmp.inds, "chains"] type <- new.sheet[tmp.inds, "type"] bnds <- cbind(bnds, chain, type) sheet$start = as.numeric(bnds[, "start"]) sheet$end = as.numeric(bnds[, "end"]) sheet$chain = as.vector(bnds[, "chain"]) sheet$sense = as.vector(bnds[, "type"]) } } } output <- list(atom = atom, het = pdb$het, helix = helix, sheet = sheet, seqres = pdb$seqres, xyz = xyz, xyz.models = xyz.models, calpha = calpha) class(output) <- "pdb" return(output) } <environment: namespace:bio3d> > help(trim.pdb) starting httpd help server ... done > pdb <- trim.pdb(pdb, ca.inds) > xyz <- trim(xyz, col.inds=ca.inds) Error: could not find function "trim" > xyz <- trim.xyz(xyz, col.inds=ca.inds) Error: could not find function "trim.xyz" > xyz <- trim.pdb1(xyz, col.inds=ca.inds) Error: could not find function "trim.pdb1" > xyz <- trim.pdb(xyz, col.inds=ca.inds) Error in trim.pdb(xyz, col.inds = ca.inds) : unused argument (col.inds = ca.inds) > trim(pdb$xyz, col.inds=sele$xyz) Error: could not find function "trim"
-
Update your Bio3D version to 2.2. Looks like you have 2.1 or 2.0.
-
reporter Got it. Thanks for your kind support and time. For sure I will look to those links you provided me to learn some basics of programming. Thank you all
-
- changed status to resolved
- Log in to comment