Can PCA be used to analyse two ensembles of protein structures?

Issue #583 new

Cheng created an issue 2018-04-04

Dear Bio3D, My understanding for the PCA tutorial (this and this) is, it can analyse an ensemble of structures, identify where the fluctuation occurs, and use several principal components to capture the variations. I like that the residue contribution (or called "Residue-wise loadings") can be visualised.

Now I have two ensembles of structures, both from Gromacs simulation at different times. One ensemble is wild type and another one is a mutant. I wonder, if I can compare the two ensembles, and analyse which residues are most different between the two?

PS: is there a tutorial for analysing mutant ensembles? Which methods are recommended?

Thank you!

Yours sincerely Cheng

Comments (10)

Barry Grant
One common (and good) way to start is by combining your two trajectories and doing your first analysis on the combined data. You can then examine (i.e. visualize and quantify) from the resulting output if there is overlap and/or separation of the conformations captured in this PC space (i.e. are they similar or quite different in terms of PC space). Other things to explore include projecting one dataset into the PC space of the other and vice-versa. Bio3D also has overlap measures where you can compare the PC results obtained from each dataset separately.

Unfortunately, I don't think we have a tutorial walkthrough for this but we probably should have...
- 2018-04-05T04:45:31+00:00
Cheng reporter
Thank you very much Grant! Can I also suggest to add the PCA analysis for secondary structures (ss)? I can clearly see the difference between two mutants, but if this difference can be quantified, that would be great! An example of the ss file is here, generated by Gromacs.
- 2018-04-05T10:59:39+00:00
Alireza Tafazzol
@bjgrant "Other things to explore include projecting one dataset into the PC space of the other and vice-versa. "

Is it something feasible to do in the current version of Bio3d?

A tutorial would be highly appreciated.
- 2018-04-26T03:11:06+00:00
Barry Grant
absolutely, you can use the project.pca() function and see the example code snippets in that functions documentation.
- 2018-04-26T05:23:43+00:00
Alireza Tafazzol
@bjgrant Dear Barry,

I have done MD simulations on two structures of a protein (one with ligand and one without ligand). I combined trajectories and did a preliminary PCA analysis on the combined trajectories. The two spaces are obviously different in the PC1-PC2 space.

I further calculated the PCs of the ligand-free structure and projected the trajectories of the ligand-bounded structure on them and got the following graph:

I am wondering what is the best way to get interesting conclusions out of these projected data? Looking into the projected matrix? Or is there a way to make a meaningful pdb/trajectory that shows these dissimilarities?

What do you suggest to get the most out of this projected matrix?
- 2018-04-26T23:05:40+00:00
Cheng reporter
@atafazzol

Can I ask, you said you "combined trajectories and did a preliminary PCA analysis".

1) Did you prepare your dcd files (for ligand-bounded and ligand-free) that only alpha-carbon entries are kept? Also, did you use the pdb with only alpha-carbon entries?

2) Then you combine the dcd files using catdcd

3) then you run the Trajectory Frame Superposition:
```
ca.inds <- atom.select(pdb, elety="CA")
xyz <- fit.xyz(fixed=pdb$xyz, mobile=dcd,
fixed.inds=ca.inds$xyz,
mobile.inds=ca.inds$xyz)
```
and PCA:
```
pc <- pca.xyz(xyz[,ca.inds$xyz])
plot(pc, col=bwr.colors(nrow(xyz)) )
```
Is this correct?
- 2018-05-08T11:02:14+00:00
Alireza Tafazzol
1) I did both "only C-alpha" and "only Backbone" kept --> But I only considered the common residues of each chain in ligand-free and ligand-bounded structures.

And I used the original PDB file to keep the same numbering/names. You can modify your original PDB file to only keep C-alpha or Backbone atoms or easily trim them in the Bio3D with "trim.pdb" command:
```
pdb_orig <- read.pdb("/home/ali/Desktop/New_PCA_2018/5ijbs_N1/TLR4/backbone_PCA/5ijb_A.pdb")
pdb <- trim.pdb(pdb_orig, “back”)
#or 
pdb <- trim.pdb(pdb_orig, elety=c("CA","N","C"))
#for example
```
2) I combined my trajectories in the AMBER CPPTRAJ, but I am pretty sure you can also use Bio3D.

3) Correct. You can also get PCs PDBs to see the motions (dynamics):
```
p1 <- mktrj.pca(pc, pc=1, b=pc$au[,1], pdb=pdb, file="pc1.pdb")
```
- 2018-05-08T16:59:11+00:00
Cheng reporter
@atafazzol thank you very much! they are very helpful! For your protein with/without ligand, that depends on what you are studying. e.g. I assum the residues around the protein-ligand interface are more flexible without ligand than with ligand?
- 2018-05-21T21:14:54+00:00
Cheng reporter
@bjgrant you said "Bio3D also has overlap measures where you can compare the PC results obtained from each dataset separately." could you please show us an example?
- 2018-05-21T21:17:36+00:00
Barry Grant
For example see: http://thegrantlab.org/bio3d/html/rmsip.html

Note the See Also links to other similarity measures including sip(), covsoverlap(), bhattacharyya() and overlap()

Their use is also in some of the vignettes I think.
- 2018-05-21T23:13:19+00:00
Log in to comment

Assignee: –

Type: task

Priority: major

Status: new

Component: Q&A

Version: –

Votes: 0

Watchers: 1