How to produce trajectories that only based on certain frames along a given principal component?

Issue #606 new
Cheng created an issue

My dcd has 5454 frames. After running

dcd <- read.dcd(dcdfile)
pdb <- read.pdb(pdbfile)

# Trajectory Frame Superposition
ca.inds <- atom.select(pdb, elety="CA")
xyz <- fit.xyz(fixed=pdb$xyz, mobile=dcd,
fixed.inds=ca.inds$xyz,
mobile.inds=ca.inds$xyz)

# Principal component analysis (PCA)
pc <- pca.xyz(xyz[,ca.inds$xyz])

Then the command p1 <- mktrj.pca(pc, pc=1, b=pc$au[,1], file="pc1.pdb") will output the pc1.pdb based on all the 5454 frames along the PC1.

Can I produce a frame_1-100.pc1.pdb that only captures the most dissimilar structures in the first 1-100 frames? Thank you!

Comments (6)

  1. Xinqiu Yao

    mktrj.pca() interpolates structures along PCs (i.e., eigenvectors of PCA). It has no relationship with original simulation frames except that PCs are calculated based on those simulations.

    I don't understand your question about "the most dissimilar structures in the first 1-100 frames". It seems you might want to re-do PCA with just first 1-100 frames (right?)

  2. Cheng reporter

    Hi Xin-Qiu, thanks for the help.

    What I mean is, for example, the 5454 frames range from -100 to 100 in the PC1 space. However, the first 100 frames only range from -50 to 0 in the PC1 space. So can I produce a frame_1-100.pc1.pdb similar to pc1.pdb but only move within the range -50 to 0 in the PC1 space?

    Because my 5454 frames come from 18 different variants' frames, I want to see how each of the variants move along the PC1 space?

    Picture2.png

  3. Xinqiu Yao

    Normally, I wouldn't suggest do it because the PC trajectory does not represent real motions of a simulation trajectory. But technically you should be able to do it:

    First, generate a full-range PC trajectory by setting a proper "mag" (see help(mktrj)). Note that the actual magnitude is determined by both "mag" and the corresponding eigenvalue; So, to generate e.g. -100 to 100, you might try mag=100/sqrt(pca$L[pc]).

    Then, in VMD, pick up frames that correspond to a specific range. The step size between frames is determined by step in the mktrj() function. Again, the actual step size is scaled by 'sqrt(pca$L[pc])'. For example, by default, step size is 0.125*sqrt(pca$L[pc]), and you can do the math to find out what frame range corresponds to -50 to 0 along PC1.

    Hope it helps.

  4. Cheng reporter

    Thank you Xin-Qiu!

    Can I clarify first,

    1) Is the pca in your mag=100/sqrt(pca$L[pc]) the same as pc in my pc <- pca.xyz(xyz[,ca.inds$xyz])?

    2) Should the [pc] in your mag=100/sqrt(pca$L[pc]) be replaced by [1] if I am referring the pc1?

    help(mktrj) shows that by default, mag = 1. I also tested it, and it resulted the same pc1.pdb with or without mag = 1 using p1 <- mktrj.pca(pc, pc=1, b=pc$au[,1], pdb=pdb, file="pc1.pdb")

    But I still do not understand the sqrt(pc$L[1]) part. sqrt(pc$L[1]) returns 36.11818, so mag=100/sqrt(pc$L[1]) means mag=2.768689. As I tested, mktrj.pca produced 90 MODELs with mag=2.768689 compared to 34 MODELs with mag=1.

    So what does the -50, 0, 50, 100 actually mean on the pc axis in the plot?

  5. Xinqiu Yao

    Yes, 'pc' is the PC number you are examining, i.e. '1' for PC 1.

    Increasing 'mag' will generate more frames/MODELs because the 'step' (determining step size between frames) is not changed but the amplitude of motion (determined by 'mag') is enlarged. For example, by default mag=1, step=0.125, and rock=TRUE, and so the number of frames is (4*mag)/step+2=34. (Note that rock=TRUE means that you have both back and forth frames plus going through the mean conformation twice). Accordingly, if mag=2.768689, the number of frames will be 90, and frame No. 46 to 57 should correspond to (-50, 0) (It is easy to check with VMD).

    The x-axis ticks are new coordinates of each conformation measured by PC1 (Similar meaning to ticks on the Cartesian 'x', 'y', or 'z' axis).

  6. Cheng reporter

    Thank you Xin-Qiu!

    Now I understand how to use mag, step and rock together. (4*mag)/step+2=34 is very helpful! To confirm:

    1) The 2 should be the motions passing through the 0 mag for two times?

    2) In 34 pdb motions:

    1: mag=0; 2-9: mag from 0 to -1; 10-17: mag from -1 to 0; 18: mag = 0; 19-26: mag from 0 to 1; 27-34: mag from 1 to 0.

    But I still do not understand the scaling part.

    1) So I think mag=1 is using the original PC coordinates to draw the motions. You can see from my picture that the frame dots cover from -100 to 100 on the PC1 coordinates (more precisely maybe -90 to 110, but let's assume it from -100 to 100). If I change to mag=2, would the pdb motions become covering from -200 to 200 on the PC1 coordinates?

    2) In my case, pc$L[1] = 1.304523e+03, sqrt(pc$L[1])= 36.11818. Could you please explain a little more about the theoretical meaning of pc$L[1] and sqrt(pc$L[1])? And also how 1.304523e+03 and 36.11818 are reflected on the pc1 range from -100 to 100?

    3) Using pc$z[14, 1] for example, I can know the precise pc1 coordinate of the 14th frame. Then I can know the precise pc1 coordinate range of the first 100 frames. Suppose the first 100 frames cover -50 to 0 on the pc1 (as shown in the previous picture), can I use mktrj.pca to draw asymmetric pdb motions with mag only from -0.5 to 0? If I use mag=0.5, the motions will be from -0.5 to 0.5, but I do not want the 0 to 0.5 part.

  7. Log in to comment