- edited description
AngularSeparation fails unittest on rare occasions
The relevant CircleCI result is here. The issue has not been reproduced, so it's unclear what exactly causes it. It may be related to the system configuration when this particular test ran, or the order in which tests run; nobody has seen it yet in practice, so it's not a major concern.
Error pasted below:
======================================================================
ERROR: test_compute_neighbors (test_order_AngularSeparation.TestAngularSeparation)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/ci/ci/freud/tests/test_order_AngularSeparation.py", line 125, in test_compute_neighbors
npt.assert_almost_equal(ang.getNeighborAngles()[1], 0, 6)
IndexError: index 1 is out of bounds for axis 0 with size 0
----------------------------------------------------------------------
Ran 134 tests in 16.877s
Comments (11)
-
-
reporter This issue appears to be a very subtle and nondeterministic problem (associated with the NeighborList, but I don't believe caused by it). The original unit test fails on these lines in the test_compute_neighbors test of AngularSeparation:
npt.assert_almost_equal(ang.getNeighborAngles()[0], np.pi/3, 6) npt.assert_almost_equal(ang.getNeighborAngles()[1], 0, 6)
In particular, if it fails (which happens with a pretty low frequency, maybe 10% of the time or so in my experience with testing) it always fails on the second line. Now, I tried replacing those lines with the following:
print("Original length: ", len(ang.nlist)) tmp = ang.getNeighborAngles() print("Length after call: ", len(ang.nlist)) npt.assert_almost_equal(tmp[0], np.pi/3, 6) print("Length after test: ", len(ang.nlist)) print(tmp[0]) print(tmp[1]) tmp = ang.getNeighborAngles() print("Length after second call: ", len(ang.nlist)) print(tmp[0]) print(tmp[1])
The resulting output from running that 20 times is attached to this issue now (note that the errors are printed above the corresponding output, I'm guessing it has to do with how Python calls the underlying C API). A sample of the output for one of the failed runs looks like this:
.Original length: 3 Length after call: 3 Length after test: 0 1.0471976 0.0 Length after second call: 0 E... ====================================================================== ERROR: test_compute_neighbors (__main__.TestAngularSeparation) ---------------------------------------------------------------------- Traceback (most recent call last): File "test_environment_AngularSeparation.py", line 138, in test_compute_neighbors print(tmp[0]) IndexError: index 0 is out of bounds for axis 0 with size 0 ---------------------------------------------------------------------- Ran 5 tests in 0.034s
Note that the failure is invariably on the second to last line, which is the first access into tmp after the 2nd call to ang.getNeighborAngles(). The issue is that when the nlist size is (or appears to be) 0 the resulting array is allocated according to that, so
tmp.size == 0
in that case. For what it's worth, I also occasionally see this output:.Original length: 3 Length after call: 3 Length after test: 4483619112 1.0471976 0.0 Length after second call: 4483619112 1.0471976 0.0 .... ---------------------------------------------------------------------- Ran 5 tests in 0.033s OK
In that case, the test doesn't fail because the array just gets massively overallocated, which is not tested for. Apparently, the call to
numpy.testing.assert_almost_equal
somehow corrupts some memory, because removing that line causes the test to run fine (I ran it 200 times with 0 failures). I've tried various other combinations of tests, and the failures are always preceded by the first call to the numpy testing framework. This issue is also present if I replace this call withself.assertTrue(np.allclose(tmp[0], np.pi/3))
orself.assertTrue(np.allclose(tmp.copy()[0], np.pi/3))
, so it doesn't appear to be directly tied to that object but rather to the the state of the memory before the NumPy API call and how it is changed afterwards.It's especially strange because the object getting passed in is just an element of a NumPy that should no longer be associated with the original AngularSeparation object or its NeighborList. It is not obvious why any of the memory associated with the AngularSeparation or NeighborList objects should be invalidated at any point since they are still in scope and shouldn't be modified, but the appearance of the large numbers for the NeighborList length certainly suggests that the NeighborList memory is being unexpectedly freed somewhere. I suspect that a free is deterministically happening every single time, but because there are so few operations involved in the test, the underlying memory (which corresponds to the m_num_bonds member of the NeighborList) is typically not rewritten so it still contains the original nlist length of 3 except in rare cases. That's just a guess though.
@mspells @joaander if either of you have any thoughts on what could be causing this, I would appreciate some insight. This could be related to Issue #151, but I don't see why there would be any issue with this memory getting invalidated in this case. I could try running this under something like valgrind, but it seems like an awful lot of work for what appears to be a relatively inconsequential bug. I would like to avoid that if possible.
-
reporter - attached output.txt
The output of running the test a number of times to demonstrate the failure rate.
-
reporter - attached output.txt
The output of running the test a number of times to demonstrate the failure rate.
-
I haven't looked into this in detail, but is it perhaps caused by the lack of links to the owning object (the reference counting stuff that was changed in response to the RDF "bug")? One easy fix would be to make
AngularSeparation.nlist_
store thedefaulted_nlist
quantity that is calculated incomputeNeighbor()
(and update the@property
for nlist to return the appropriate element of that tuple). That way we are keeping around the neighbor list and the object that holds it. Another option would be to return independent python copies in themake_default_nlist*
helper functions rather than having them alongside the generator compute object, but that would have a performance cost. -
reporter - changed status to closed
Close issue171
→ <<cset f03d29858fdb>>
-
reporter Close issue171
→ <<cset f03d29858fdb>>
-
reporter @mspells your intuition was spot on. I've pushed an initial fix, but since the NeighborList is now in broad use throughout freud I suspect that this issue exists in many places and this test just happens to be one that pinpoints it. I'll have to do a sweep of the code base to make sure that it's addressed everywhere. For now I created a separate member variable rather than doing what you suggested so that the nlist_ member still has the same meaning in AngularSeparation as in every other class, but I may switch it to your suggestion if that makes sense to do throughout freud.
Alternatively, instead of our original fix that is causing this problem, should we instead switch to the alternate solution we identified of explicitly calling the garbage collector in the NeighborList? That would allow us to fix this in one place, and it would just entail calling gc.collect instead of manually removing that pointer that we removed before.
-
reporter - changed status to open
We have a solution, but have not yet identified the best global solution. This issue should stay open until the proper solution is applied throughout freud.
-
- changed component to testing-validation
-
reporter - changed status to resolved
Resolved by Pull Request #246. The decision was to copy the NeighborList to avoid these problems.
- Log in to comment
This issue has continued to occur on CircleCI.