PCA calulation for long simulations

Issue #458 resolved
Former user created an issue

Dear sir,

previusly i have done the cij matrix calculation for a very long trajectory(500ns). at first i divided the trajectory into 10 parts(50ns each) and calculated 10 cij matrix for each and then avaraeg out these 10 matrices to find the final cij matrix. Now i want to do PCA analysis for the same trajectory. My queries are as follows:

  1. what is the best way to do the PCA calculation covering such a big trajectory ? 2.. what are the objects do i need to save to plot the PCA analysis graphs in some other graphing software ? and how can i do that?

spare me for diverting your attention

-kp

Comments (5)

  1. Lars Skjærven

    I would consider reading only CA coordinates of the trajectory. This can be done when reading the trajectory through function read.ncdf (check out argument at.sel), or by subsetting, e.g. through function trim.xyz. The pca.xyz function should then be able to cope with your trajectory depending on how large the protein is and and how many frames you have. If you are still in trouble, I would subset the frames, e.g. take every 5th / 10th frame. That shouldn't matter much on the final PCs. Again, you can do that when reading (stride argument in read.ncdf).

    Why on earth do you need anything else than R for plotting? :)

  2. Kajwal Kumar Patra

    Dear sir, thanks for the response.I am considering only the CA atoms My protein system contains 487 CA atoms. and total frame is 50,000. so its quite big in terms of numbers of frame. that is why for the dccm() calculation i had divided the total frame into 10 sub parts and calculated the cij matrix for each part and then took the avarage for the final consensus cij matrix. Now for pca() calculations

    (i ) Should i repeat the same procedure and get 10 pca() calculation data files and then make a consensus final datafile as i did for dccm() calculation.(firstly,is it possible for pca() calculation?)

    (ii) or is there any better way to do this pca( ) calculatuon fo finally get single plot covering all my frames?

    Secondly i don't know much about R and plotting in R.I use xmgrace for plotting. so it will be helpful if i can write and save the data to plot . that is why i need to know the objects(data files) which i have to save.

    thanks

  3. Lars Skjærven

    How long is the simulation? I doubt that you need 50k frames for converged PCs. As I suggest above, use every 5th frame. Also, make sure you read only calpha trajectories as I suggested above.

    For question 2, you should explore the pca object returned from function pca.xyz() and functions such as write.table() for writing data to disk. e.g. write projections to disk would be write.table(pc$z[, 1:10]), file="projections.dat"). See help(pca.xyz) for more information on the pca object and how to access other attributes of the object.

  4. Log in to comment