Questions regarding trajectory analysis

Issue #308 resolved

Karan Kapoor created an issue 2015-12-03

I have a few questions regarding trajectory analysis.

I have completed PCA analysis of the trajectory using C-alpha atoms. When I try to output the respective principle components using 'mktrj', it only outputs the C-alpha atoms of all the residues. I have also tried adding info from the pdb object to the mktrj but I am getting an error:

Error in write.pdb(xyz = coor, file = file, ...) : write.pdb: the lengths of all input vectors != 'length(xyz)/3'.

Is there a way I could retrieve the full model of the protein?
I have multiple trajectories of the same protein with slight changes in the residues. When I plot the PC's of the different trajectories using 'plot(pc, col=bwr.colors(nrow(xyz)) )', the x/y-axis values change as well (comparing PC1vsPC2 plot from 1st trajectory to PC1vsPC2 plot from 2nd trajectory, etc). As I would like to compare the same plots from different trajectories, is there a way to make these plots with the same x/y-axis values?
Generating the cross-correlation using 'cij<-dccm(xyz[,ca.inds$xyz])' plots the residues starting from 1. I have mapped the distance matrix rows and columns to residue number using:

rownames(cij) <- resno colnames(cij) <- resno

but I am still getting the plot with the wrong residue numbers. Is there a way to generate the plot with the correct residue numbers?
'view.dccm(cij, pdb, launch=TRUE)' outputs the structures with cross-correlated residues. Can I specify the residue numbers for which to show the correlation in the output structure? For example, cross-correlation between residue 10 and rest of the protein.
I am able to cluster the PCs using 'hclust(dist(pc$z[,1:2]))'. Is there a way to obtain the structures of the cluster centers for comparison? And is there a way to let hclust (or some other method) compute the number of suitable clusters to divide the trajectory in, maybe by providing it with a distance threshold to use?

Thank you for your time.

Comments (12)

Karan Kapoor reporter
- edited description
- 2015-12-03T01:26:29+00:00
Ayesha kanwal
these same questions are also raised from my side too.
- 2015-12-03T16:07:15+00:00
Barry Grant
For your Q1. If you do your PCA on Calpha atoms then the output of mktrj will be confined to Calpha atoms. You can view the resulting PDB trajectories as "Tube" Drawing method in VMD for example. If you want to see more atoms then the simplest approach is to do your PCA on more atoms - for example backbone or all heavy atoms.

The error you report with mktrj() is telling you that the "lengths of all input vectors" are not equal to the number of atoms (i.e. length(xyz)/3). That is because you are likely passing all atom residue number data etc. to mktrj for which you only pass Calpha xyz coordinates to from your PCA object. TO solve this you could create a PDB object (or the individual vectors such as reside etc.) that matches your atom subset subjected to PCA. Here is an example for Calpha atoms:
```
## Make a Calpha (or other selection) PDB object that matches you PCA atom set
> ca.pdb = trim.pdb(pdb, "calpha")
> mktrj(pc, pc=1, pdb=ca.pdb, file="pc1.pdb")
```
- 2015-12-03T16:42:41+00:00
Barry Grant
For your Q2. You can use the xlim and ylim arguments to set the axis limits to be the same for whatever plots you want. For example:
```
par(mfcol=c(1,2), pty="s")
plot(1:3, xlim=c(-11,11), ylim=c(-11,11))
plot(1:10, xlim=c(-11,11), ylim=c(-11,11))
```
- 2015-12-03T16:48:03+00:00

Barry Grant

For your Q3. you can pass a list object with whatever axis annotation you want to the plot.dccm() function. For example below I calculate a dccm from a quick NMA and then play with the axis labels via the scales argument:

pdb = read.pdb("4q21")
modes = nma(pdb)
cij = dccm(modes)

##- normal plot
plot(cij, sse=pdb)

##- Some arbirtry axis labels
plot(cij, sse=pdb, scales=list(at=c(1,50,100,150), labels=c("these","can","be","anything")) )

##- Lets change the axis labels to match PDB resnumbes offset by 111
my.numbs = pdb$atom$resno[pdb$calpha] + 111
my.tick = seq(1, length(my.numbs), by=20) # label every 20th position
my.labels = list(at=my.tick, labels=my.numbs[my.tick])

plot(cij, sse=pdb, scales=my.labels)

2015-12-03T17:18:18+00:00

Barry Grant
For your Q4. One approach here could be to create a temporary copy of your cij object and zero out all non residue 10 associated values. I will note that view.dccm() is re-named to pymol.dccm() and there will be a new view() family of functions in the next major release that will allow you to do these sorts of things more easily and without pymol or VMD.
- 2015-12-03T17:24:58+00:00

Xinqiu Yao

Here is my answer to your Q5. You can easily calculate the "mean" structure for each cluster of structures by following commands:

# suppose your trajectory is `xyz`
#    cluster ID stored in `grps`

# the mean structure for the cluster #5
mxyz.5 <- colMeans(xyz[grps==5, ])

hclust() computes dendrogram only and doesn't give actually clustering. Use cutree() instead (see help(cutree) for details). For example,

hc <- hclust(dist(pc$z[,1:2]))
grps <- cutree(hc, k=8) # give 8 clusters

# or

grps <- cutree(hc, h=50) # automatically cluster based on a distance threshold = 50

2015-12-03T20:20:37+00:00

Karan Kapoor reporter
Thanks a lot for the answers/suggestions. Could you provide more info about the following:

Q2. I am using 'pc.axes' for plotting individual plots with specific xlim/ylim values. But when I use 'pc.axes=1:3', I get an error:

Input 'pc.axes' should be NULL (for overview plots) or numeric and of length two (for individual score plot).

How can I plot PC1 vs PC3? Also, how can I plot single scree plot of variance vs eigen values?

Q5. Running 'colMeans' on the xyz coordinates of the trajectory gives me an array. Is there a way to convert this to a PDB structure?
- 2015-12-04T03:26:56+00:00
Barry Grant
As you state Input 'pc.axes' should be NULL (for overview plots) or numeric and of length two (for individual score plot).

Note that your pc.axes=1:3 is not of length 2. Try it and see:
```
> pc.axes=1:3
> length(pc.axes)
[1] 3
> pc.axes
[1] 1 2 3
```
Perhaps you want pc.axes=c(1,3)....

To get a loadings plot use the function plot.pca.scree(). So for example with you pca result called pc:
```
plot.pca.scree(pc)
```
We should probably add this info more clearly to the documentation page of the plot.pca() functions.
- 2015-12-04T14:37:32+00:00
Barry Grant
If your dimensions are consistent then you should be able to pass your colMeans() output directly to write.pdb() as the xyz input argument along with a matching pdb structure object you already have and whatever file name you want.

Note that if you don't supply the pdb input (or resid, resno etc.) you will get an all ALA Calpha atom file which may or may not be sufficient for your needs.

SIDE-NOTE: Please ask one question per post - as multiple questions are hard to track if we have answered and will be more difficult to find in the future. Also suppling example code that demonstrates a problem is always a good idea see http://stackoverflow.com/help/mcve
- 2015-12-04T14:49:49+00:00
Karan Kapoor reporter
Thanks, I will keep that in mind in future.

Not sure if it has been implemented, but a 'write.dcd' function will also be useful for modifying/ writing DCD (NAMD) files.
- 2015-12-04T22:17:06+00:00
Barry Grant
- changed status to resolved
- 2015-12-06T19:25:52+00:00
Log in to comment

Assignee: –

Type: task

Priority: minor

Status: resolved

Component: Q&A

Version: –

Votes: 0

Watchers: 1