MD trajectories analysis on cross correlation network analysis and PCA clustering memory issues

Issue #770 closed
Mariana Simoes Ferreira created an issue

Hi,

I’ve been trying to run analysis on MD trajectories such as getting the PCA and generating cross correlation mapping from that, while following the tutorials on http://thegrantlab.org/bio3d_v2/tutorials/protein-structure-networks and http://thegrantlab.org/bio3d_v2/tutorials/principal-component-analysis .

I though about loading each replicate separately to generate the dccm filtered file but the tutorial shows for an small ensemble of pdbs. So, I have concatenated the 3 independent simulations into one file of trajectory to load in bio3d and apply the cutoffs.

File dcd size is a total of 430MB running in a computer with 48GB of memory.

I’ve encountered problems regarding running out of memory. The trajectories already contain only the CA atoms for each. Besides bio3d, the loaded libraries are parallel, igraph and bigmemory.

For the PCA clustering it stops here:

hc <- hclust(dist(pc$z[,1:2]))

For the CMAP filtering, here:

cm <- cmap(trj, dcut = 4.5, scut = 0, pcut = 0.75, mask.lower = FALSE,ncore=10)

Is there any other way I can execute that without running out of memory?

Thanks

Comments (3)

  1. Xinqiu Yao

    Hi,

    There are several solutions you may consider:

    1. Use ncore=1 rather than 10, because multicore usually uses a lot more memory. This will slow down the calculation of course but would eventually finish the job.
    2. Reduce the trajectory size by skipping frames (e.g., take 1 for every 10, then you will have 10 times smaller file size). You can do it in any software that supports basic DCD trajectory editing (e.g. VMD), or directly in R by trj_small ← trj[seq(1, nrow(trj), 10), ].
    3. Calculate DCCM and CMAP for each replicate separately and then combine them. For DCCM, the function filter.dccm() may be helpful (see ?filter.dccm() and the latest Protein Structure Network tutorial for more detail). For CMAP, you can add the option binary=FALSE to the calculation of each replicate, which returns the fraction of frames forming a contact rather than 0/1. Then, take the average of the fraction over all replicates and manually apply a threshold (e.g., 75%) to determine the final binary state.

  2. Log in to comment