CorrelationFunction behavior at 0

Issue #3 resolved
Matthew Spellings created an issue

Currently, CorrelationFunction always sets the correlation function value at the first bin to be 0 (really, the value for the default constructor). This is acceptable if you're computing the self-correlation of one set to itself and don't want the first value, but if you're computing cross-correlation between two different sets, it is not necessarily what you want.

Consider the following code, which computes the correlation of one point at the origin (with associated value 1) to a randomly-generated set of points with associated values of the magnitude of their radial distance from the origin. The correlation function should then be identity.

import numpy as np
import numpy.testing as npt
from freud import trajectory, density

ref = np.array([[0, 0, 0]], dtype=np.float32)
refVals = np.array([1], dtype=np.float32)
rmax = 10.0
dr = 1.0
num_points = 10000
box_size = rmax*3.1
points = np.random.random_sample((num_points,3)).astype(np.float32)*box_size - box_size/2
pointRs = np.sqrt(np.sum(points**2, axis=-1))

cf = density.FloatCF(trajectory.Box(box_size), rmax, dr)
cf.compute(ref, refVals, points, pointRs)

import matplotlib, matplotlib.pyplot as pp
pp.plot(cf.getR(), cf.getRDF())
pp.show(block=True)

Currently, however, master sets the value at the first bin to be 0 unconditionally.

Comments (28)

  1. Matthew Spellings reporter

    The referenced commit computes the correlation function for all bins instead of skipping the first one. This would mean that, when computing the correlation between a set and itself (i.e. cf.compute(points, vals, points, vals)), we get <vals[i]**2> in the first bin.

  2. Matthew Spellings reporter

    A few resolutions that spring to mind:

    • Leave things as they are on master
    • Compare pointers or some other sort of test to determine whether the arrays are the same and self/self correlations should be ignored
    • Take a function argument to determine whether self/self correlations should be ignored
    • Always compute self/self correlations and let people set bin 0 to be 0 if they want it
  3. Eric Harper

    Would simply seeing if delta < 1e-6 or something be sufficient? That would mean that it's two diff points.

    I would also be ok with the last one as that seems like reasonable behavior.

  4. Matthew Spellings reporter

    So far, the predominant use case is for the point arrays to be the same; I think just comparing the pointers at the start might be the most reasonable course of action.

  5. Matthew Spellings reporter

    Well, more properly, encasing the summation while looping over particles within something like if(i != j || points != ref_points).

  6. Joshua Anderson

    We can document that when ref_points != points, ref_points should not contain any points that are in points. I'm not sure how often users would even want to do that anyways. But best to have it noted how that is handled.

  7. Matthew Spellings reporter

    Stylistically, I much prefer

    if(!x && y) foo;
    

    to

    if(x) continue;
    if(y) foo;
    

    but maybe that's just me...

  8. Log in to comment