rdf.accumulate return standard deviation

Issue #144 on hold
Mike Henry created an issue

It would be nice to get the standard deviation from the accumulation of the RDF over multiple frames.

Comments (8)

  1. Eric Harper

    @mikemhenry A brief search shows that "Welford's Method" is a suitable on-line way to compute the standard deviation. However, the parallel manner by which we compute the RDF means that each computation step would require an additional pass through the data in order to compute the standard deviation (currently we only combine and average the arrays when the user requests the rdf, since that just involves dividing the number of counts in each histogram bin by the number of frames, number of particles averaged over, etc). We can implement this fairly simply with a flag for the computation mode. How does that sound? Also, let me know if you are interested in working on implementing; it wouldn't take me too long, but I don't have a lot of time I can devote to code development right now.

  2. Mike Henry reporter

    That sounds really good, I would be more than happy to impalement this. I'm in the middle of my dissertation proposal so I won't really have time to work on this until May, but I would love to get my hands dirty in the Freud code!

  3. Eric Harper

    Hey @mikemhenry, just a quick follow up, "Welford's method" should be trivially implementable at the python level with a script, and should be similarly efficient (as I am currently envisioning the code, given the reduction call over all threads) so it's probably worth investigating that route before we add this internally to freud:

    mean_arr = np.zeros(shape=(num_bins))
    std_arr = np.zeros(shape=(num_bins))
    
    for i, f in enumerate(frames):
        # code to calc stuff in freud
        rdf_handle.compute(args)
        rdf_arr = rdf_handle.getRDF()
        old_mean = np.copy(mean_arr)
        mean_arr += ((rdf_arr - mean_arr)/i)
        std_arr += (rdf_arr - mean_arr)*(rdf_arr-old_mean)
    std_arr /= (i-1)
    

    It's also possible we could work out the math to do a per-thread version of this and somehow combine the std deviations, but this would be the quickest way to compute the std dev.

  4. Eric Harper

    @mikemhenry checking in to see if the suggestions I made helped out, or if you wanted to pursue these additions.

  5. Mike Henry reporter

    I'm still interested in this, I will have to check the performance to see if its worth doing in c++. I'll play around with this, thanks!

  6. Log in to comment