Clustering PC-space using hclust
Hello,
I am trying to cluster the individual conformers in the PC-space generated using the pca.xyz command. But as the trajectory is large, ~100,000 frames, I am getting memory error when using hclust (Error: cannot allocate vector of size 37.6 Gb).
Is there a way to reduce the number of conformers that the clustering will be carried out on? For example add a step =10 in hclust?
Thanks, Karan
Comments (3)
-
-
reporter Lars,
Thanks for the quick reply. I was able to do the analysis using the suggested changes.
I am not very experienced with PCA and not sure if I am losing a lot of information when reducing the trajectory size. I have attached the two plots I get before and after reducing the trajectory (every 10 steps). There seems to be similar groupings of conformations along the PC1 in both the plots but the jumps between these groups are less smooth in the second plot (white shows interversion between different groups, correct?).
But if I cluster this PC space (attached), the separation between the different clusters is much more clear when I use the reduced trajectory.
Do you have any suggestion which plots I should use to interpret the data?
Thanks, Karan
-
- changed status to resolved
This all depends on your purpose Karan. It looks like both the full and reduced set analysis are giving you similar distributions and clusters in PC space so you could start further analysis with the more tractable reduced set and then verify any conclusions by reference back to the full set.
- Log in to comment
Hi Karan, The easiest is probably to reduce the size of your trajectory prior to calling
pca.xyz()
. You can alternatively filter out structures from the projection of structures to the PCs (which I assume your clustering is based):