glotzer / freud / issues / #144 - rdf.accumulate return standard deviation — Bitbucket

Issue #144 on hold

Mike Henry created an issue 2017-04-09

It would be nice to get the standard deviation from the accumulation of the RDF over multiple frames.

Comments (8)

Eric Harper
@mikemhenry Great idea! That shouldn't be too difficult to implement, I'll take a look
- 2017-04-11T14:16:03+00:00
Eric Harper
@mikemhenry A brief search shows that "Welford's Method" is a suitable on-line way to compute the standard deviation. However, the parallel manner by which we compute the RDF means that each computation step would require an additional pass through the data in order to compute the standard deviation (currently we only combine and average the arrays when the user requests the rdf, since that just involves dividing the number of counts in each histogram bin by the number of frames, number of particles averaged over, etc). We can implement this fairly simply with a flag for the computation mode. How does that sound? Also, let me know if you are interested in working on implementing; it wouldn't take me too long, but I don't have a lot of time I can devote to code development right now.
- 2017-04-11T16:09:04+00:00
Mike Henry reporter
That sounds really good, I would be more than happy to impalement this. I'm in the middle of my dissertation proposal so I won't really have time to work on this until May, but I would love to get my hands dirty in the Freud code!
- 2017-04-11T18:05:05+00:00
Eric Harper
Hey @mikemhenry, just a quick follow up, "Welford's method" should be trivially implementable at the python level with a script, and should be similarly efficient (as I am currently envisioning the code, given the reduction call over all threads) so it's probably worth investigating that route before we add this internally to freud:
```
mean_arr = np.zeros(shape=(num_bins))
std_arr = np.zeros(shape=(num_bins))

for i, f in enumerate(frames):
    # code to calc stuff in freud
    rdf_handle.compute(args)
    rdf_arr = rdf_handle.getRDF()
    old_mean = np.copy(mean_arr)
    mean_arr += ((rdf_arr - mean_arr)/i)
    std_arr += (rdf_arr - mean_arr)*(rdf_arr-old_mean)
std_arr /= (i-1)
```
It's also possible we could work out the math to do a per-thread version of this and somehow combine the std deviations, but this would be the quickest way to compute the std dev.
- 2017-04-13T14:54:27+00:00
Eric Harper
@mikemhenry checking in to see if the suggestions I made helped out, or if you wanted to pursue these additions.
- 2017-05-15T15:10:57+00:00
Mike Henry reporter
I'm still interested in this, I will have to check the performance to see if its worth doing in c++. I'll play around with this, thanks!
- 2017-06-02T15:12:16+00:00
Bradley Dice
- changed component to density
- 2018-03-18T06:02:15+00:00
Bradley Dice
- changed status to on hold
- 2018-03-18T06:03:04+00:00
Log in to comment

Assignee: –

Type: enhancement

Priority: minor

Status: on hold

Component: density

Version: –

Votes: 0

Watchers: 2

Jira: the preferred issue tracker for Bitbucket. Join the team!