RMSD of several pdbs from data path

Issue #85 resolved
Former user created an issue

I would like to calculate the RMSD of multiple (~500) pdbs in comparison with each other. All pdbs are saved in a single folder. Is there a way to read.pdb() a whole folder rather than typing in each individual file name?

Also, I am trying to follow the tutorial for this on the website, however am also having trouble with the command code. Once I read.pdb() all of the files, what are/is the command codes in order to conduct the rmsd analysis? The tutorial has rmsd(pdbs, fit=TRUE), however when I use this in the R software, it says "pdbs" are undefined.

Any assistance on both topics would be greatly appreciated.

Emily

Comments (2)

  1. Xinqiu Yao

    Hi Emily,

    The "pdbs" in the tutorial is an object generated by running either read.fasta.pdb() or pdbaln(). It contains both sequence alignment and 3D structural information, and a very convenient way for common tasks involving multiple PDB structures (compared to repeatedly reading PDB files using read.pdb()). It is also extensively used in Bio3D examples, tutorials, and vignettes. Considering you have raw pdb files, using pdbaln() function is the easiest way to get it. Here is an example:

    files <- list.files(path_to_pdb, ".*\\.pdb$", full.names=TRUE)
    pdbs <- pdbaln(files)
    rr <- rmsd(pdbs$xyz, fit=TRUE)
    

    Note that pdbaln() only reads coordinates of CA atoms; If you want to consider side-chains or other main-chain heavy atoms, you may need to check out read.all() function.

  2. Barry Grant

    I am marking this as resolved. As Xinqiu points out the answer to the first question is to use the list.files() function, e.g.

    > list.files(mypath, ".pdb$")
     [1] "1RPI.pdb" "1RQ9.pdb" "1RV7.pdb" "1TW7.pdb" "3OQ7.pdb" "3OQA.pdb"
     [7] "3OQD.pdb" "3OTS.pdb" "3OTY.pdb" "3OU1.pdb" "3OU3.pdb" "3OU4.pdb"
    [13] "3OUA.pdb" "3OUB.pdb" "3OUC.pdb" "3OUD.pdb" "3PJ6.pdb" "3R0W.pdb"
    [19] "3R0Y.pdb" "3SO9.pdb" "3SPK.pdb" "4EYR.pdb" "4FAE.pdb" "4FAF.pdb"
    [25] "4GYE.pdb" "4GZF.pdb"
    

    The 'full.names=TRUE' might be required if the files live in a different directory, e.g.

    > list.files(mypath, ".pdb$", full.names=TRUE)
     [1] "hivp/raw_pdbs//1RPI.pdb" "hivp/raw_pdbs//1RQ9.pdb"
     [3] "hivp/raw_pdbs//1RV7.pdb" "hivp/raw_pdbs//1TW7.pdb"
     [5] "hivp/raw_pdbs//3OQ7.pdb" "hivp/raw_pdbs//3OQA.pdb"
     [7] "hivp/raw_pdbs//3OQD.pdb" "hivp/raw_pdbs//3OTS.pdb"
     [9] "hivp/raw_pdbs//3OTY.pdb" "hivp/raw_pdbs//3OU1.pdb"
    [11] "hivp/raw_pdbs//3OU3.pdb" "hivp/raw_pdbs//3OU4.pdb"
    [13] "hivp/raw_pdbs//3OUA.pdb" "hivp/raw_pdbs//3OUB.pdb"
    [15] "hivp/raw_pdbs//3OUC.pdb" "hivp/raw_pdbs//3OUD.pdb"
    [17] "hivp/raw_pdbs//3PJ6.pdb" "hivp/raw_pdbs//3R0W.pdb"
    [19] "hivp/raw_pdbs//3R0Y.pdb" "hivp/raw_pdbs//3SO9.pdb"
    [21] "hivp/raw_pdbs//3SPK.pdb" "hivp/raw_pdbs//4EYR.pdb"
    [23] "hivp/raw_pdbs//4FAE.pdb" "hivp/raw_pdbs//4FAF.pdb"
    [25] "hivp/raw_pdbs//4GYE.pdb" "hivp/raw_pdbs//4GZF.pdb"
    

    Then if the PDB files are of different composition (i.e. contain a different number of atoms, residues etc. ), the pdbaln() function will find equivalent positions and produce an object suitable for further analysis.

    > myfiles <- list.files(mypath, ".pdb$", full.names=TRUE)
    > pdbs <- pdbaln(files)
    

    If the PDB files have the same number of atoms then you can skip the alignment step of the pdbaln() call and use a simple loop, perhaps something like this:

    myfiles <- list.files(mypath, ".pdb$", full.names=TRUE)
    xyz <- NULL
    for(i in 1:length(myfiles)) {
        xyz <- rbind(xyz, read.pdb(myfiles[i])$xyz)
    }
    pdb <- read.pdb(myfiles[1])
    ca.inds <- atom.select(pdb, "calpha")
    rmsd(xyz[,ca.inds$xyz], fit=TRUE)
    
  3. Log in to comment