How to produce trajectories that only based on certain frames along a given principal component?
My dcd
has 5454 frames. After running
dcd <- read.dcd(dcdfile)
pdb <- read.pdb(pdbfile)
# Trajectory Frame Superposition
ca.inds <- atom.select(pdb, elety="CA")
xyz <- fit.xyz(fixed=pdb$xyz, mobile=dcd,
fixed.inds=ca.inds$xyz,
mobile.inds=ca.inds$xyz)
# Principal component analysis (PCA)
pc <- pca.xyz(xyz[,ca.inds$xyz])
Then the command p1 <- mktrj.pca(pc, pc=1, b=pc$au[,1], file="pc1.pdb")
will output the pc1.pdb
based on all the 5454 frames along the PC1.
Can I produce a frame_1-100.pc1.pdb
that only captures the most dissimilar structures in the first 1-100 frames? Thank you!
Comments (6)
-
-
reporter Hi Xin-Qiu, thanks for the help.
What I mean is, for example, the 5454 frames range from -100 to 100 in the PC1 space. However, the first 100 frames only range from -50 to 0 in the PC1 space. So can I produce a
frame_1-100.pc1.pdb
similar topc1.pdb
but only move within the range -50 to 0 in the PC1 space?Because my 5454 frames come from 18 different variants' frames, I want to see how each of the variants move along the PC1 space?
-
Normally, I wouldn't suggest do it because the PC trajectory does not represent real motions of a simulation trajectory. But technically you should be able to do it:
First, generate a full-range PC trajectory by setting a proper "mag" (see
help(mktrj)
). Note that the actual magnitude is determined by both "mag" and the corresponding eigenvalue; So, to generate e.g. -100 to 100, you might trymag=100/sqrt(pca$L[pc])
.Then, in VMD, pick up frames that correspond to a specific range. The step size between frames is determined by
step
in themktrj()
function. Again, the actual step size is scaled by 'sqrt(pca$L[pc])'. For example, by default, step size is 0.125*sqrt(pca$L[pc]), and you can do the math to find out what frame range corresponds to -50 to 0 along PC1.Hope it helps.
-
reporter Thank you Xin-Qiu!
Can I clarify first,
1) Is the
pca
in yourmag=100/sqrt(pca$L[pc])
the same aspc
in mypc <- pca.xyz(xyz[,ca.inds$xyz])
?2) Should the
[pc]
in yourmag=100/sqrt(pca$L[pc])
be replaced by[1]
if I am referring the pc1?help(mktrj)
shows that by default,mag = 1
. I also tested it, and it resulted the samepc1.pdb
with or withoutmag = 1
usingp1 <- mktrj.pca(pc, pc=1, b=pc$au[,1], pdb=pdb, file="pc1.pdb")
But I still do not understand the
sqrt(pc$L[1])
part.sqrt(pc$L[1])
returns36.11818
, somag=100/sqrt(pc$L[1])
meansmag=2.768689
. As I tested,mktrj.pca
produced 90 MODELs withmag=2.768689
compared to 34 MODELs withmag=1
.So what does the
-50
,0
,50
,100
actually mean on the pc axis in the plot? -
Yes, 'pc' is the PC number you are examining, i.e. '1' for PC 1.
Increasing 'mag' will generate more frames/MODELs because the 'step' (determining step size between frames) is not changed but the amplitude of motion (determined by 'mag') is enlarged. For example, by default
mag=1
,step=0.125
, androck=TRUE
, and so the number of frames is(4*mag)/step+2=34
. (Note thatrock=TRUE
means that you have both back and forth frames plus going through the mean conformation twice). Accordingly, ifmag=2.768689
, the number of frames will be 90, and frame No. 46 to 57 should correspond to (-50, 0) (It is easy to check with VMD).The x-axis ticks are new coordinates of each conformation measured by PC1 (Similar meaning to ticks on the Cartesian 'x', 'y', or 'z' axis).
-
reporter Thank you Xin-Qiu!
Now I understand how to use
mag
,step
androck
together.(4*mag)/step+2=34
is very helpful! To confirm:1) The
2
should be the motions passing through the 0mag
for two times?2) In 34 pdb motions:
1: mag=0; 2-9: mag from 0 to -1; 10-17: mag from -1 to 0; 18: mag = 0; 19-26: mag from 0 to 1; 27-34: mag from 1 to 0.
But I still do not understand the scaling part.
1) So I think
mag=1
is using the original PC coordinates to draw the motions. You can see from my picture that the frame dots cover from -100 to 100 on the PC1 coordinates (more precisely maybe -90 to 110, but let's assume it from -100 to 100). If I change tomag=2
, would the pdb motions become covering from -200 to 200 on the PC1 coordinates?2) In my case,
pc$L[1]
=1.304523e+03
,sqrt(pc$L[1])
=36.11818
. Could you please explain a little more about the theoretical meaning ofpc$L[1]
andsqrt(pc$L[1])
? And also how1.304523e+03
and36.11818
are reflected on the pc1 range from -100 to 100?3) Using
pc$z[14, 1]
for example, I can know the precise pc1 coordinate of the 14th frame. Then I can know the precise pc1 coordinate range of the first 100 frames. Suppose the first 100 frames cover -50 to 0 on the pc1 (as shown in the previous picture), can I usemktrj.pca
to draw asymmetric pdb motions withmag
only from -0.5 to 0? If I usemag=0.5
, the motions will be from -0.5 to 0.5, but I do not want the 0 to 0.5 part. - Log in to comment
mktrj.pca()
interpolates structures along PCs (i.e., eigenvectors of PCA). It has no relationship with original simulation frames except that PCs are calculated based on those simulations.I don't understand your question about "the most dissimilar structures in the first 1-100 frames". It seems you might want to re-do PCA with just first 1-100 frames (right?)