Error in overlaping of PCA modes

Issue #469 resolved
Kajwal Kumar Patra created an issue

Dear sir, I was following your paper "Principal component and normal mode analysis of proteins; a quantitative comparison using the GroEL subunit". after being motivated to do the overlaping analysis, i find some queries:

  1. in fig.2 you have calculated squared overlap of 200ns pca data wrt modes(ie. NMA, i guese) . can you please let me know the mathematical expression for overlap calculation.Is it the simple dot product between two eigen vector matrices?

2.in fig.3 you have calculated squared overlap of PCs of different MD trajectories. i tried to do that but found some errors. please see my code given bellow

pdb8 <- read.pdb("../WT_tetra/PCA/nw_chB_bkbnnw_All.pdb")
dcd8 <- read.dcd("../WT_tetra/PCA/NW_nw_s5_chB_bkbn_stride5.dcd")
inds8 <- atom.select(pdb8, elety = "CA")
trj8 <- fit.xyz(fixed = pdb8$xyz, mobile = dcd8, fixed.inds = inds8$xyz, mobile.inds = inds8$xyz)
pc8 <- pca.xyz(trj8[, inds8$xyz])

pdb1 <- read.pdb("../08_GWT/PCA/nw_nw_GWT_neut_shift.pdb")
inds1 <- atom.select(pdb1, chain = "B", elety = "CA")
dcd1 <- read.dcd("../08_GWT/PCA/NW_nw_GWT_stride5.dcd")
trj1 <- fit.xyz(fixed = pdb1$xyz, mobile = dcd1, fixed.inds = inds1$xyz, mobile.inds = inds1$xyz)
pc1 <- pca.xyz(trj1[, inds1$xyz])
r1 <- rmsip(pc8, pc1)
print(r1)
plot(r1)
pdb1_1 <- trim.pdb(pdb1, chain="B")
print(pdb1_1)
modes_01 <- nma(pdb1_1)
oa_01 <- overlap(modes_01, pc1$U[,1])   # for fig.2 like graphs #

 length(pc8$U)
[1] 2134521
 length(pc1$U)
[1] 2134521 

test <-overlap(pc8$U, pc1$U)     # for fig.3 like graphs #

Error in overlap(pc8$U, pc1$U :
overlap: unequal vector lengths

so how can i draw a graph like fig.3.

thanks

Comments (13)

  1. Lars Skjærven

    Hi, In both Figure 2 and 3 we calculate the squared overlap between mode vectors (e.g. PCs or NMs) and a difference vector describing the conformational difference between the open and closed states. See function difference.vector() for the calculation of the difference vector. Also consult the documentation for the overlap() function where an example is provided.

    For the error: The second argument to overlap() can not be a matrix. It must be a vector as described in help(overlap).

  2. Kajwal Kumar Patra reporter

    Hello sir, thanks for the quick response. In my calculations i have 5 different pdbs and their trajectories. i have calculated the PCs from the trajectories for eacg systems as mentioned in the code. i want to see the similarities(overlaping oh one PC with otherPC) between the PCs . i dont have difference vector like objects.. for fig.2 like graph, i did this: modes_01 <- nma(pdb1_1)
    oa_01 <- overlap(modes_01, pc1$U[,1])
    . Is that fine?

    But for fig.3 :Variation in the principal components obtained from MD simulations (for my case there 5 PCAs or MD trajectories) how can i proceed without difference vectors. or please let me know how exactly you have constructed the fig.3.

    One more thing , sir how can i represent a pdb structure retutned from mktrj.pca() in vector field representation in VMD.

  3. Lars Skjærven

    With 5 PDBs you can certainly create a difference vector if that's what you need/want - and there is a functional conformational transition that you want to study.

    Q1: That is fine if you want to compare the PCs obtained from MD with mode vectors from NMA. That's not exactly what figure 2 is showing though. Our goal with that plot was to evaluate how different methods could capture protein slow (functional) motions. We did that by calculating the difference between open and closed, and compared that particular vector with eigenvectors obtained from all-atom NMA, calpha NMA, and PCA from MD simulations. If you want to compare your MD simulation with the "known" conformational changes as seen from X-ray crystallography, I would either do PCA on your 5 PDBs, and compare the individual PCs with PCs obtained from MD (and potentially also NMA).

    Q2: With your 5 PDBs you can easily calcualte that difference vector as mentioned above.

    Q3: Currently only PyMOL is the option for that (I think).

  4. Kajwal Kumar Patra reporter

    Hi sir, thanks again. My goal here is to compare the 5 PCs obtained from the MD simulation trajectories of 5 PDBs. what i understood from above is that, i can do this

    by (i) calculating a difference vector from xray crystallography structures of 5 PDBs and then overlaping each PCs(obtained from MD trajectoirs) with the difference vector. Say my pdb are (pdb1, pdb2, pdb3, pdb4, pdb5). how can i create a difference vector from 5 PDBs. i saw difference.vector() but it calcualtes for 2 pdbs only. i am getting error while alliigning(struct.aln()) more than 2 pdbs. it would be grateful if u can show me the differnce.vector() calculation for 5 PDBs.

    please spare my demands

  5. Kajwal Kumar Patra reporter

    once again Hi sir,

    i did the rmsip calculation for two PCs

    rmsip(pc_A, pc_B)    #pc_A, pc_B are pca.xyz() of trajectories A and B
    

    it returns (i) a rmsip value and (ii)overlap:a numeric matrix containing pairwise (squared) dot products between the modes it is considering 10 largest eigenvectots of each PCs(eg. pc1 to pc10). So does that overlap means the simple dot product between two eigenvectors(say pc1 of pc_A with pc1 of pc_B) or it means the squre of the dot product of eigenvectors.

    the big question: if i plot the overlap matrix(i got from rmsip()), will it satisfy the necessity of comparing PCs of two md trajectories)

    Also how can we constrct Cosine correlation between PCs of two independent trajectories (say between pc_A and pc_B) ?

    thanks

  6. Xinqiu Yao

    Hi,

    The overlap is square inner product and so it won't have a negative sign.

    The overlap matrix taking at least top 10 PCs from each trajectory should represent a plausible comparison about the essential dynamics of the two trajectories. It is hard to say it satisfies or not. As you said, cosine correlation might be an alternative or supplemental way to compare trajectories. Unfortunately, bio3d does not have function to do such analysis. We may include it in a future release as we marked in this issue

  7. Log in to comment