Parallelization with multicore/parallel and bigmemory

Issue #45 resolved
Xinqiu Yao created an issue

It will substantially improve the performance of multicore parallelization if we could avoid the large data communication between "master" and "children" processes. The shared-memory variables provided by bigmemory package enable us to do it. I would like to go over all multicore supported functions and see what we can improve. There is one problem: bigmemory limits the shared variable to be "matrix", whereas in many functions lists are extensively used. For example, nma.pdbs() returns all.modes if full=TRUE, which is a list of complex objects (modes returned by calling nma()) and hard to be transformed to matrix. Can we bypass this barrier? Any suggestion is appreciated!

Comments (5)

  1. Xinqiu Yao reporter

    I pushed a branch ("feature_para") for the development of parallelization, in order to avoid messing up current development for serial version. It is quite nice that multicore and bigmemory are excellent combination: I tried it on nma.pdbs() and got a speedup of 6 with 8 cores (full=FALSE). I will go ahead to re-parallelize other functions and see what we can get.

    I am also writing test codes with testthat package, currently pushed under util/tests. Put your own tests if you don't mind, and I will appreciate it very much.

  2. Barry Grant

    This looks excellent Xinqiu. I am not sure that there is an easy way to cope with the list structures in many functions, such as that stored for each call to nma() within nma.pdbs() when full=TRUE.

    However, I am looking forward to seeing how this combination might help seed up fit.xyz() and rmsd(). We already had bigmemory as a "suggests" dependency and it works on windows too.

    Thanks for this!

  3. Barry Grant

    Has this proposal been taken care of now Xinqiu, I remember you sent nice speedup plots for fit.xyz() etc.
    We could add them here and close this issue if appropriate.

  4. Xinqiu Yao reporter

    We have put some HPC tasks for bio3d on Todo list. This old issue can be closed I think. For a reference, I also attached the plot made before for the speedup. Thanks for checking! test_bio3d_mc.png

  5. Log in to comment