SCOTCH mesh partitioner has side effects

Issue #719 closed
Jan Blechta created an issue

Subsequent calls to UnitSquareMesh(4, 4) produce different meshes (with SCOTCH partitioner). On the other hand dofmap reordering (using SCOTCH) seems to be fine. Following script demonstrates the problem:

from __future__ import print_function
from dolfin import *

parameters.mesh_partitioner = 'SCOTCH'

comm = mpi_comm_world()
rank = MPI.rank(comm)

def create_mesh():
    mesh = UnitSquareMesh(4, 4)
    return mesh.num_vertices(), mesh.num_cells()

def create_dofmap(mesh=None):
    if not mesh:
        mesh = UnitSquareMesh(4, 4)
    V = FunctionSpace(mesh, "P", 1)
    return (mesh.num_vertices(), mesh.num_cells(),
            mesh.topology().hash(), mesh.geometry().hash(),
            V.dofmap().ownership_range())

def test_determinism(func):
    result0 = func()
    while True:
        result1 = func()
        diff = int(result0 != result1)
        if MPI.max(comm, diff) > 0:
            print(rank, result0, result1)
            break

def main():
    print(rank, "Test mesh build")
    test_determinism(create_mesh)
    print()
    MPI.barrier(comm)

    print(rank, "Test mesh and dofmap build")
    test_determinism(create_dofmap)
    print()
    MPI.barrier(comm)

    print(rank, "Test dofmap build with static mesh")
    mesh = UnitSquareMesh(4, 4)
    test_determinism(lambda: create_dofmap(mesh=mesh))
    print()
    MPI.barrier(comm)

if __name__ == '__main__':
    main()

Output on my system:

1 Test mesh build
0 Test mesh build
2 Test mesh build
0 (11, 10) (11, 10)

2 (12, 11) (13, 11)
1 (12, 11) (11, 11)


2 Test mesh and dofmap build
1 Test mesh and dofmap build
0 Test mesh and dofmap build
2 (12, 11, 16345232928620356316L, 8317758475742265463, (16, 25)) (12, 11, 15156240723649878619L, 7318551913629007991, (16, 25))
1 (12, 11, 8462581566930303203L, 6060639746735424119, (8, 16)) (12, 11, 13550905085782729940L, 4625243689556973335, (7, 16))

0 (11, 10, 4607311713341590251L, 2348913127836622316, (0, 8)) (11, 10, 482017684130753681L, 4193610146276079084, (0, 7))


0 Test dofmap build with static mesh
2 Test dofmap build with static mesh
1 Test dofmap build with static mesh

and the program hangs (the last test never fails).

With parameters.mesh_partitioner = 'ParMETIS' the problem disappers (the program hangs in the first test). Dof reordering library does not play a role here.

Comments (14)

  1. Jan Blechta reporter

    There is bug in SCOTCH_randomReset() which is supposed to reset pseudorandom number generator. The bug appears (at least) in SCOTCH 6.0.0-6.0.3. SCOTCH 6.0.4 is fixed. Buggy versions can be fixed by compiling with -DCOMMON_RANDOM_SYSTEM.

    We can ask for fixing this in PETSc (either version bump or -DCOMMON_RANDOM_SYSTEM workaround) or switch default mesh partitioner to ParMETIS. Opinions @garth-wells, @chris_richardson?

  2. Jan Blechta reporter

    PETSc fixed this in maint. Let's wait for release of 3.7.4, bump to that version in docker dev image and consider this as fixed.

    Maybe we could add SCOTCH version check and switch default mesh partitioner to ParMETIS when SCOTCH is buggy.

  3. Jan Blechta reporter

    SCOTCH partitioning has side effects (creates different topologies on subsequent mesh generations). It does not currently break anything now but it is undesired behaviour. It is dangerous, for example, in combination with #720. It was in fact a reason for random fails in test_p31_box_2 and we haven't been able to debug this for a long time.

  4. Jan Blechta reporter
    • changed status to open

    Let's keep this open until 3.7.4 is used in docker images (our default distribution method).

  5. Log in to comment