strange behaviour of Mesh.bounding_box_tree()

Issue #669 invalid
Florian Bruckner created an issue

after installing dolfin 1.6 via hashdist on our scientific cluster we encounter some strange problems with the interpolate-method for certain function spaces, which lead to problems within some of our test-cases (others work as expected).

We were able to pinpoint one problem to the evaluation of the bounding_box_tree. The following example prints different results on the (broken) cluster installation compared with a local hashdist installation as well as the official FEniCS Ubuntu version on a local machine:

from dolfin import *

mesh = UnitCubeMesh(10,10,10)

p = Point(1.0,1.0,1.0)
t=mesh.bounding_box_tree()
print t.compute_collisions(p)

On our cluster installation we get the following output:

[5994 5995 5996 5998 5997 5999]

whereas the local hashdist installation as well as the official FEniCS Ubuntu version yield:

[5995 5999 5997 5996 5994 5998]

The only difference between the installations that i can see, is that the cluster installation uses the system gcc (4.8.2) as well as openmpi (1.10.0) which are provided via a module system. The test code is executed on a single thread without MPI via 'taskset -c 0 python crash.py' in order to eliminate the influence of the parallelization issues.

any hint where this problem can come from? does the bounding_box_tree code depend on any external package?

greetings Florian

Comments (17)

  1. Florian Bruckner reporter

    Thats true. Maybe we should start from the original test problem:

    from dolfin import *
    
    #parameters['num_threads'] = 1
    #parameters['allow_extrapolation'] = True
    
    mesh = Mesh("mesh.xml.gz")
    #mesh = UnitCubeMesh(10,10,10)
    
    right = AutoSubDomain(lambda x: x[1] >= 0.5)
    right.mark(mesh, 3, 1)
    left =  AutoSubDomain(lambda x: x[1] < 0.5)
    left.mark(mesh, 3, 2)
    
    submesh = SubMesh(mesh, 1)
    
    V     = FunctionSpace(mesh, 'DG', 0)
    V_sub = FunctionSpace(submesh, 'DG', 0)
    
    f = interpolate(Expression("x[1]"), V)
    f_sub = interpolate(f, V_sub)
    

    Executed on the cluster installation this code leads to the following error during interpolation (at least for certain meshes and a certain domain composition):

    *** -------------------------------------------------------------------------
    *** Error:   Unable to evaluate function at point.
    *** Reason:  The point is not inside the domain. Consider calling "Function::set_allow_extrapolation(true)" on this Function to allow extrapolation.
    *** Where:   This error was encountered inside Function.cpp.
    *** Process: 0
    *** 
    *** DOLFIN version: 1.6.0
    *** Git changeset:  
    *** -------------------------------------------------------------------------
    

    Since the same code with the same meshfile runs without problems on two independent local installations we started looking for some differences of the mesh object. Most things (num_cells, num_vertices, ...) were similar on both machines. One difference that we found is the mentioned bounding_box_tree(), which somehow may lead to an out-of-bounds evaluation during the interpolation. Maybe this is misleading but it definitely shows different behaviour on both machines.

    How can it be that the order is different? the code is run on a single core without MPI so everything should be deterministic, right?. Or are there any external system dependency which are not covered by the hashdist? installation?

    thanks for any advice Florian

  2. Anders Logg (Chalmers)

    When you interpolate from one mesh to another, it is very likely that some of the points appear on or very close to the boundary. Then subtle differences in the underlying libraries (such as 32 bit vs 64 bit) lead to different arithmetic at the very basic level. So in other words round-off errors will be different. Points on the boundary will then sometimes be judged to be inside or outside of the domain. I suggest setting following the advice to (set_allow_extrapolation). This means that when a point happens to be outside, extrapolation will be used instead of interpolation, but since the point is anyway really close to the boundary, there is no essential difference, other than perhaps a very small performance penalty in needing to search for the closest cell to extrapolate from.

  3. Claas Abert

    set_allow_extrapolation unfortunately does not resolve the issue. We came up with the following minimal example

    from dolfin import *
    
    parameters['allow_extrapolation'] = True
    
    mesh = UnitCubeMesh(50,5,5)
    V    = FunctionSpace(mesh, 'DG', 0)
    f    = Function(V)
    f(0.781250, 0.093750, 0.250000)
    

    which leads to

    *** -------------------------------------------------------------------------
    *** Error:   Unable to create mesh entity.
    *** Reason:  Mesh entity index -1 out of range [0, 7500] for entity of dimension 3.
    *** Where:   This error was encountered inside MeshEntity.cpp.
    *** Process: 0
    ***
    *** DOLFIN version: 1.6.0
    *** Git changeset:
    *** -------------------------------------------------------------------------
    

    This error is completely reproducible on the cluster (it does not crash for any given coordinate, we found a suitable coordinate by iteration). The error does not occur on our local installations.

    The error seems to be caused by a call to mesh.bounding_box_tree()->compute_closest_entity(point).first which returns -1.

  4. Florian Bruckner reporter

    or even simpler:

    from dolfin import *
    
    parameters['allow_extrapolation'] = True
    
    mesh = UnitCubeMesh(50,5,5)
    print mesh.bounding_box_tree().compute_closest_entity(Point(0.781250, 0.093750, 0.250000))
    

    yields

    Building point search tree to accelerate distance queries.
    Computed bounding box tree with 14999 nodes for 7500 points.
    (-1, 0.008838834764831754)
    

    on the cluster installation compared with this correct result on the local machine:

    Building point search tree to accelerate distance queries.
    Computed bounding box tree with 14999 nodes for 7500 points.
    (1739, 0.0)
    
  5. Florian Bruckner reporter

    gcc (4.8.2) as well as openmpi (1.10.0) should be the only system dependencies we are aware of. everything else should be built via hashdist.

  6. Claas Abert

    Thanks for looking into this. We are using the current stable (1.6) built with hashdist. The above test only fails on our cluster (GCC 4.8.2), it's working well on our local machines (using hashdist as well as Ubuntu packages).

    I did a few more tests and the error does not seem to be connected to mesh condition (as suggested in the ticket). I can reproduce the same error with UnitCubeMesh(51,50,50) which should have very good condition.

  7. Claas Abert

    Thank you Øyvind. So simply switching to another compiler might resolve this? We will give it a try.

  8. Florian Bruckner reporter

    It seems that this was exactly the problem on our cluster installation. Switching to the newer compiler (gcc 4.9.3) resolved the issue. Now everything works as expected.

    Thanks for all your valuable inputs greetings Florian

  9. Log in to comment