MD trajectories analysis on cross correlation network analysis and PCA clustering memory issues
Hi,
I’ve been trying to run analysis on MD trajectories such as getting the PCA and generating cross correlation mapping from that, while following the tutorials on http://thegrantlab.org/bio3d_v2/tutorials/protein-structure-networks and http://thegrantlab.org/bio3d_v2/tutorials/principal-component-analysis .
I though about loading each replicate separately to generate the dccm filtered file but the tutorial shows for an small ensemble of pdbs. So, I have concatenated the 3 independent simulations into one file of trajectory to load in bio3d and apply the cutoffs.
File dcd size is a total of 430MB running in a computer with 48GB of memory.
I’ve encountered problems regarding running out of memory. The trajectories already contain only the CA atoms for each. Besides bio3d, the loaded libraries are parallel, igraph and bigmemory.
For the PCA clustering it stops here:
hc <- hclust(dist(pc$z[,1:2]))
For the CMAP filtering, here:
cm <- cmap(trj, dcut = 4.5, scut = 0, pcut = 0.75, mask.lower = FALSE,ncore=10)
Is there any other way I can execute that without running out of memory?
Thanks
Comments (3)
-
-
- marked as task
Not a bug
-
- changed status to closed
No reply
- Log in to comment
Hi,
There are several solutions you may consider:
ncore=1
rather than 10, because multicore usually uses a lot more memory. This will slow down the calculation of course but would eventually finish the job.trj_small ← trj[seq(1, nrow(trj), 10), ]
.filter.dccm()
may be helpful (see?filter.dccm()
and the latest Protein Structure Network tutorial for more detail). For CMAP, you can add the optionbinary=FALSE
to the calculation of each replicate, which returns the fraction of frames forming a contact rather than 0/1. Then, take the average of the fraction over all replicates and manually apply a threshold (e.g., 75%) to determine the final binary state.