How to get per residue rmsf including side chain contributions?

Lars Skjærven

hmm.. you'll have to re-read all PDBs, and go through them to fetch side chain beta factors:

# fetch files
ids <- c("1a70_A", "1czp_A", "1frd_A")
files <- get.pdb(ids, split = TRUE)

# Sequence Alignement
pdbs <- pdbaln(files)

# new matrix to store results
fluct <- pdbs$b * 0

# go through each PDB
for( i in 1:length(pdbs$id) ) {
    # read it
    pdb <- read.pdb(pdbs$id[i])

    # find residues
    ca.inds <- atom.select(pdb, "calpha")

    # residue identifyer
    resid <- paste(pdb$atom$resno[ca.inds$atom],
                   pdb$atom$chain[ca.inds$atom],
                   sep="-")

   # sum over side chains within a residue
    bsum <- unlist(lapply(resid, function(x) sum(pdb$atom$b[ resid == x ])))

    # store data in matrix
    fluct[i, !is.na(fluct[i,])] <- bsum
}

2015-03-10T19:35:13+00:00

Lars Skjærven

edited description

2015-03-10T19:36:56+00:00

Xinqiu Yao

Hey,

You can also check out the read.all() function, although it is still experimental. Remember that for heterogeneous sequences, alignment for side-chain heavy atoms is simply based on atom names, which may or may not represent the chemical "equivalence". With this function, you can calculate e.g. average rmsf of backbone atoms:

# use the example alignment file containing 4 kinesin structures
file <- system.file("examples/kif1a.fa",package="bio3d")
aln  <- read.fasta(file)

# read backbone atoms for each residue
sel <- c("N", "CA", "C", "O")
pdbs <- read.all(aln, sel=sel)

# fit structures based on all backbone atoms
xyz <- fit.xyz(pdbs$all[1, ], pdbs$all)

# calculate rmsf for each atom of non-gapped residue position
r <- rmsf(xyz[, gap.inspect(xyz)$f.inds])

# group atoms based on residue number
resno <- as.numeric(pdbs$all.resno[1, gap.inspect(pdbs$all.resno)$f.inds])

# calculate mean rmsf for each residu
r.mean <- tapply(r, as.factor(resno), mean)

# plot
plot(r.mean, typ='h', ylab="Mean RMSF of backbone atoms (A)")

2015-03-10T20:19:40+00:00

Lars Skjærven

Nice. This is indeed cuter than my suggestion. Perhaps read.all could be renamed to make it more visible? This could perhaps also take a place in one of the vignettes(?).

2015-03-10T21:47:43+00:00

Xinqiu Yao

Thanks! Not sure which name is more suitable; it basically does similar work as read.fasta.pdb and so maybe read.fasta.all?

This function hasn't been updated for a long time (that is why I said 'experimental' considering we have a lot updates for other 'core' functions. But it still can run without problem). We probably should pay more attention to it and illustrating it in vignette is a good idea. Will see what we can do.

2015-03-10T22:16:40+00:00

Barry Grant

The read.all() function dates back to the very early days of the package and updates and simplifications would be welcome to make it's output more consistent with recent updates elsewhere.

Its main use was to enable equivalent 'all-atom' reading and subsequent analysis of heterogenous structure sets - just as asked here.

Of course if you have homogenous structures (e.g. the same atoms across your set, as in a simulation output for example or an NMR ensemble) the answer to this question is straightforward - just calculate the RMSF of all your chosen atoms and take the mean based on residue number.

I would resist renaming until function updates have been finalized.

2015-03-11T03:18:35+00:00

Barry Grant

changed status to resolved

2015-03-21T01:32:48+00:00

Comments (7)