Partition information is read incorrectly from HDF5 files for quad and hex meshes

Issue #1000 new
David Kamensky created an issue

Passing true for the argument use_partition_from_file when reading quad or hex meshes from HDF5 files in parallel results in either different partitioning or an error, e.g.,

HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 1:
  #000: ../../../src/H5Dio.c line 161 in H5Dread(): selection+offset not within extent
    major: Dataspace
    minor: Out of range
HDF5-DIAG: Error detected in HDF5 (1.8.16) MPI-process 1:
  #000: ../../../src/H5Dio.c line 161 in H5Dread(): selection+offset not within extent
    major: Dataspace
    minor: Out of range

My preliminary testing indicates that this can be resolved by the following change to dolfin/io/HDF5File.cpp:

    // Add partitioning attribute to dataset
    std::vector<std::size_t> partitions;
    const std::size_t topology_offset
      = MPI::global_offset(_mpi_comm.comm(),
               //////////////////////////////////////////////
               // old:
               //topological_data.size()/(cell_dim + 1),
               //
               // modified:
               topological_data.size()/num_cell_points,
               //////////////////////////////////////////////
                           true);

However, someone more knowledgeable should look into this. The following Python script can be used (with more than one MPI task) to demonstrate the bug and (apparent) efficacy of the above modification in version 2018.1.0.dev0 (although the same issue affects 2017.2):

# If the initial mesh has quadrilateral or hexahedral cells, the old code
# does not correctly re-use partitioning information from HDF files, and,
# as meshes are written and re-read multiple times passing the argument
#
#  use_partition_from_file = true ,
#
# the partitioning becomes increasingly imbalanced, or, in some cases, results
# in an error.

# Illustrative test case (assuming version 2018.1.0.dev0):

from dolfin import *
from dolfin import MPI
mycomm = MPI.comm_world
mpisize = MPI.size(mycomm)
mpirank = MPI.rank(mycomm)

# Create inintial mesh, mesh1:

# Only works correctly after modification:
#mesh1 = UnitSquareMesh.create(10,10,CellType.Type.quadrilateral)
mesh1 = UnitCubeMesh.create(10,10,10,CellType.Type.hexahedron)

# Works with old code and modified code:
#mesh1 = UnitSquareMesh.create(10,10,CellType.Type.triangle)
#mesh1 = UnitCubeMesh.create(10,10,10,CellType.Type.tetrahedron)

# Write mesh1 to an HDF file
hfile = HDF5File(mycomm,"asdf.h5","w")
hfile.write(mesh1,"mesh/")
hfile.close()

# Read the contents of the HDF file back into mesh2
mesh2 = Mesh()
hfile = HDF5File(mycomm,"asdf.h5","r")

# The last argument to read() is supposed to ensure that the mesh remains
# partitioned in the same way as the mesh that was originally written.
hfile.read(mesh2,"mesh/",True)

hfile.close()

# Write mesh2 to the HDF file
hfile = HDF5File(mycomm,"asdf.h5","w")
hfile.write(mesh2,"mesh/")
hfile.close()

# Read the contents of the HDF file back into mesh3
mesh3 = Mesh()
hfile = HDF5File(mycomm,"asdf.h5","r")
hfile.read(mesh3,"mesh/",True)
hfile.close()

# Check that partitioning is the same between all three meshes (i.e., meshes
# 1, 2, and 3 have the same number of cells on each rank):
for i in range(0,mpisize):
    if(i==mpirank):
        print("Rank = "+str(i)+":")
        print("  mesh1 has "+str(mesh1.num_cells())+" cells")
        print("  mesh2 has "+str(mesh2.num_cells())+" cells")
        print("  mesh3 has "+str(mesh3.num_cells())+" cells")
    MPI.barrier(mycomm)

Comments (0)

  1. Log in to comment