How to extract PDB file

Issue #223 resolved
Ashfaq Ahmad created an issue

Hi guys, My question is is from a tutorial "Beginning with trajectory analysis" and i am dealing with MD trajectory .dcd file. Once I plot all my conformers on PC planes, the next step is to cluster them in to 2 groups using

hc <- hclust(dist(pc$z[,1:2]))
grps <- cutree(hc, k=2)

Here I got two groups with different colors, now I want to extract a PDB file from group 1 and group 2 individually. Is that possible? Sorry if my question is not that good.

Comments (11)

  1. Lars Skjærven

    Your grps variable is a vector of the same length as the trajectory containing 1's and 2' assigning each frame to either group 1 or 2. All indices for frames in group 1 can be found with inds <- which(grps == 1). Use this to get the coordinates from your trajectory with xyz[inds[1], ]. You can then write these to a pdb object:

    inds <- which(grps == 1)
    crd <- xyz[inds[1], ]
    write.pdb(xyz=crd, file="test.pdb")
    

    Also note that write.pdb takes plenty of input arguments, e.g. to assign residue names and numbers. See ?write.pdb for more information. Hope it helps !

  2. Ashfaq Ahmad reporter

    Thanks @Lars, I tried it and got a big pdb file, probably containing the Ca-atoms for all the conformers from the grp 1. I opened it in VMD but looks wired with a scattered atoms. Yup i read write.pdb that require too many argument to be accurate. I was looking to obtain a single pdb file to represent the whole cluster.

  3. Barry Grant

    You will be writing C-alpha only atoms here perhaps so try using the "Tube" representation in VMD for example.

    If you want a singe output structure that in some way represents a given cluster you could determine the closest to the mean conformation for that cluster. The functions you might consider using for this include mean() via the apply() function, rmsd(), which.min() and of course write.pdb().

    Let us know if you would like help with this - but I do encourage you to try it yourself also ;-)

  4. Ashfaq Ahmad reporter

    @bjgrant Sir, I have tried the above options but couldn't implement them well. I am working with bio3d since last one year, during this time i was only dependent on straight forward commands mentioned in tutorials. Although they really worked out well for our two separate data enough good to be submitted in to Nature and Plos quality journals. Being a pure biological structure candidate at times I feel almost helpless even to understand the mathematical languages and their applications for such programs. I will try my best to dig these functions you suggested.

  5. Lars Skjærven

    Hi Asfaq, Learning scripting / programming might be a bit time consuming, but once you get the hold of it the opportunities in Bio3D certainly gets larger. There are plenty of online resources to learn a bit more about R programming. Try e.g. http://tryr.codeschool.com/ or the following book http://www.nostarch.com/artofr.htm

    If my example above gave you multiple frames you did something wrong. Did you remember to access only the first index (inds[1]) ? Also, does your trajectory contain only CA atoms, or do you have all atoms in the xyz matrix? if so, use atom.select and trim.xyz to work only with the CA atoms.

    Here is a complete running example on clustering to obtain representative structures:

    # read all atom trajectory
    xyz <- read.dcd("mydcdfile")
    
    # read pdb file containing the same number (and ordering) as the dcd file
    pdb <- read.pdb("mypdbfile")
    
    ca.inds <- atom.select(pdb, "calpha")
    
    # trim PDB and XYZ to CA atoms only
    pdb <- trim(pdb, ca.inds)
    xyz <- trim(xyz, col.inds=ca.inds)
    
    # clustering on RMSD
    rd <- rmsd(xyz, fit=TRUE)
    hc <- hclust(as.dist(rd))
    grps <- cutree(hc, k=3)
    
    # find representatives
    unq <- unique(grps)
    all.inds <- seq(1, length(grps))
    rep.inds <- rep(NA, length(unq))
    
    for(i in 1:length(unq)) {
      grp <- unq[i]
      inds <- which(grps==grp)
    
      if(length(inds) > 1) {
        tmp <- rd[inds, inds]
    
        m <- rowMeans(tmp)
        rep.ind <- which.min(m)
        rep.ind <- all.inds[inds][rep.ind]
      }
     else {
        rep.ind <- inds[1]
     }
    
      rep.inds[i] <- rep.ind
    }
    
    # indices of cluser representatives
    rep.inds
    
    write.pdb(pdb, xyz=xyz[rep.inds[1], ], file="rep_clust1.pdb")
    write.pdb(pdb, xyz=xyz[rep.inds[2], ], file="rep_clust2.pdb")
    write.pdb(pdb, xyz=xyz[rep.inds[3], ], file="rep_clust3.pdb")
    
  6. Ashfaq Ahmad reporter

    @Lars. I am trying to trim it but my bio3d package can not recognize this function. You can see below for details;

    #!> pdb <- trim(pdb, ca.inds)
    Error: could not find function "trim"
    > xyz <- trim(xyz, col.inds=ca.inds)
    Error: could not find function "trim"
    > trim.pdb
    function (pdb, inds = NULL, sse = TRUE) 
    {
        if (class(pdb) != "pdb") 
            stop("input 'pdb' must be a list object as returned from 'read.pdb'")
        if (is.null(inds)) 
            stop("no selection indices provided")
        if (!is.list(inds)) 
            stop("selection indices must be provided i.e. from 'atom.select'")
        if (is.null(inds$atom) || is.null(inds$xyz)) 
            stop("selection indices must be provided i.e. from 'atom.select'")
        atom <- pdb$atom[inds$atom, ]
        xyz <- pdb$xyz[inds$xyz]
        calpha <- as.logical(atom[, "elety"] == "CA")
        if (!is.null(pdb$xyz.models)) 
            xyz.models <- pdb$xyz.models[, inds$xyz]
        else xyz.models <- NULL
        sse.unbound <- function(sse) {
            ex <- NULL
            ch <- NULL
            ty <- NULL
            for (i in 1:length(sse$start)) {
                tmp <- sse$start[i]:sse$end[i]
                ex <- c(ex, tmp)
                ch <- c(ch, rep(sse$chain[i], length(tmp)))
                ty <- c(ty, rep(sse[[4]][i], length(tmp)))
            }
            ub <- cbind(ex, ch, ty)
            ub <- cbind(ub, apply(ub, 1, function(x) paste(x[1:2], 
                collapse = "_")))
            colnames(ub) <- c("nums", "chains", "type", "strid")
            return(ub)
        }
        bounds.inds <- function(nums) {
            if (length(nums) == 1) {
                return(1)
            }
            bounds.inds <- 1
            nums.start <- nums[1]
            diff.i <- 1
            for (i in 2:length(nums)) {
                if ((nums[i] - diff.i) != nums.start) {
                    bounds.inds <- c(bounds.inds, i)
                    nums.start <- nums[i]
                    diff.i <- 1
                }
                else {
                    diff.i <- diff.i + 1
                }
            }
            return(bounds.inds)
        }
        helix <- NULL
        sheet <- NULL
        if (sse) {
            trimmed <- atom[calpha, c("resno", "chain")]
            if (!is.matrix(trimmed)) 
                trimmed <- t(as.matrix(trimmed))
            strid <- apply(trimmed, 1, function(x) paste(x, collapse = "_"))
            trimmed <- cbind(trimmed, strid)
            colnames(trimmed) <- c("nums", "chains", "strid")
            if (length(pdb$helix$start) > 0) {
                tmp <- sse.unbound(pdb$helix)
                ht <- tmp[tmp[, "strid"] %in% trimmed[, "strid"], 
                    1:3]
                new.helix <- matrix(ht, ncol = 3, byrow = FALSE)
                colnames(new.helix) <- c("nums", "chains", "type")
                if (nrow(new.helix) > 0) {
                    nums <- as.numeric(new.helix[, "nums"])
                    bnds <- bounds(nums, dup.inds = FALSE, pre.sort = FALSE)
                    tmp.inds <- bounds.inds(nums)
                    chain <- new.helix[tmp.inds, "chains"]
                    type <- new.helix[tmp.inds, "type"]
                    bnds <- cbind(bnds, chain, type)
                    helix$start = as.numeric(bnds[, "start"])
                    helix$end = as.numeric(bnds[, "end"])
                    helix$chain = as.vector(bnds[, "chain"])
                    helix$type = as.vector(bnds[, "type"])
                }
            }
            if (length(pdb$sheet$start) > 0) {
                tmp <- sse.unbound(pdb$sheet)
                ht <- tmp[tmp[, "strid"] %in% trimmed[, "strid"], 
                    1:3]
                new.sheet <- matrix(ht, ncol = 3, byrow = FALSE)
                colnames(new.sheet) <- c("nums", "chains", "type")
                if (nrow(new.sheet) > 0) {
                    nums <- as.numeric(new.sheet[, "nums"])
                    bnds <- bounds(nums, dup.inds = FALSE, pre.sort = FALSE)
                    tmp.inds <- bounds.inds(nums)
                    chain <- new.sheet[tmp.inds, "chains"]
                    type <- new.sheet[tmp.inds, "type"]
                    bnds <- cbind(bnds, chain, type)
                    sheet$start = as.numeric(bnds[, "start"])
                    sheet$end = as.numeric(bnds[, "end"])
                    sheet$chain = as.vector(bnds[, "chain"])
                    sheet$sense = as.vector(bnds[, "type"])
                }
            }
        }
        output <- list(atom = atom, het = pdb$het, helix = helix, 
            sheet = sheet, seqres = pdb$seqres, xyz = xyz, xyz.models = xyz.models, 
            calpha = calpha)
        class(output) <- "pdb"
        return(output)
    }
    <environment: namespace:bio3d>
    > help(trim.pdb)
    starting httpd help server ... done
    > pdb <- trim.pdb(pdb, ca.inds)
    > xyz <- trim(xyz, col.inds=ca.inds)
    Error: could not find function "trim"
    > xyz <- trim.xyz(xyz, col.inds=ca.inds)
    Error: could not find function "trim.xyz"
    > xyz <- trim.pdb1(xyz, col.inds=ca.inds)
    Error: could not find function "trim.pdb1"
    > xyz <- trim.pdb(xyz, col.inds=ca.inds)
    Error in trim.pdb(xyz, col.inds = ca.inds) : 
      unused argument (col.inds = ca.inds)
    > trim(pdb$xyz, col.inds=sele$xyz)
    Error: could not find function "trim"
    
  7. Ashfaq Ahmad reporter

    Got it. Thanks for your kind support and time. For sure I will look to those links you provided me to learn some basics of programming. Thank you all

  8. Log in to comment