gmsh, exchange_ghost_cells freezes

Issue #41 new

Alex created an issue 2016-07-28

Observed non-returning exchange_ghost_cells today. This happens when I am reading a gmsh-generated and gmsh-partitioned file with a (maybe too) large number partitions for few elements. Didn't dig any deeper (low priority) but would assume that the main cause is a faulty partitioning (empty partitions, isolated elements, etc). However, I would have expected an error from MOAB.

Here's the code snippet:

  moab::ErrorCode l_error;
MPI_Barrier( MPI_COMM_WORLD ); LOG_DEBUG << "exchanging ghost cells";
  // get ghost entities
#ifdef PP_USE_MPI
  l_error = m_pcomm.exchange_ghost_cells( m_dim, m_dim-1, 1, m_dim-1, true, true ); checkError( l_error );
#endif
LOG_DEBUG << "ex gh cells done"; MPI_Barrier( MPI_COMM_WORLD );
LOG_DEBUG << "freeze..";

and here the output I am getting

2016-07-27 22:52:22,369 1 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 4 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 11 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 7 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 8 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 0 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 9 DEBUG exchanging ghost cells
2016-07-27 22:52:22,371 6 DEBUG exchanging ghost cells
2016-07-27 22:52:22,370 10 DEBUG exchanging ghost cells
2016-07-27 22:52:22,372 5 DEBUG exchanging ghost cells
2016-07-27 22:52:22,371 3 DEBUG exchanging ghost cells
2016-07-27 22:52:22,370 2 DEBUG exchanging ghost cells
2016-07-27 22:52:22,387 10 DEBUG ex gh cells done
2016-07-27 22:52:22,387 4 DEBUG ex gh cells done
2016-07-27 22:52:22,387 8 DEBUG ex gh cells done
2016-07-27 22:52:22,387 6 DEBUG ex gh cells done
2016-07-27 22:52:22,387 5 DEBUG ex gh cells done
2016-07-27 22:52:22,387 3 DEBUG ex gh cells done
2016-07-27 22:52:22,387 11 DEBUG ex gh cells done
2016-07-27 22:52:22,387 1 DEBUG ex gh cells done
2016-07-27 22:52:22,388 7 DEBUG ex gh cells done
2016-07-27 22:52:22,388 9 DEBUG ex gh cells done
2016-07-27 22:52:22,386 0 DEBUG ex gh cells done
^Cmpiexec: killing job...

Looks like rank 2 is stuck.

Mesh is attached, MOAB-version is 4.9.0 (with issue-38 fixed manually).

Comments (14)

Alex reporter
- edited description
- 2016-07-28T06:12:27+00:00
Iulian Grindeanu
I was not able to reproduce a problem on master branch, with issue 38 fixed; I converted first to an h5m file, that can be read in parallel later; this file has 12 partitions;

mbconvert ~/Downloads/mesh_tet4_mpi.msh mesh_g.h5m
then tried various read options
```
 mpiexec -np 12 mbconvert -O PARALLEL=READ_PART -O PARTITION=PARALLEL_PARTITION -O PARALLEL_RESOLVE_SHARED_ENTS -O PARALLEL_GHOSTS=3.2.2 mesh_g.h5m -o PARALLEL=WRITE_PART oo.h5m

mpiexec -np 12 mbconvert -O PARALLEL=READ_PART -O PARTITION=PARALLEL_PARTITION -O PARALLEL_RESOLVE_SHARED_ENTS -O PARALLEL_GHOSTS=3.2.2.3 mesh_g.h5m -o PARALLEL=WRITE_PART oo.h5m
```
one note, the fourth parameter is an integer, not a bool (addl_ents, could be 0, default, 1 edges, 2 faces, 3 both edges and faces); I assume that you want to use faces (2), because I see that your bridge dimension is 2, faces.

Or how are you loading in parallel? Using read_delete option?
- 2016-07-28T10:14:55+00:00

Alex reporter

Iulian, thanks for looking into this.

I am attaching a reproducer, which should build on your machine. If I am uncommenting line 18, MOAB is able to jump over line 34. However the response time seems way too slow: Maybe there's a timeout somewhere?

//  std::string l_loadOpts  = "PARALLEL=BCAST_DELETE;";
  std::string l_loadOpts  = "PARALLEL=READ_PART;";
              l_loadOpts += "PARTITION=PARALLEL_PARTITION;";
              l_loadOpts += "PARALLEL_RESOLVE_SHARED_ENTS;";

alex@alex-OptiPlex-9020 ~/moab_debug $ mpiexec.openmpi -n 12 ./a.out 
main running: 2
main running: 3
main running: 4
main running: 6
main running: 0
main running: 11
loading: 6
loading: 2
loading: 3
loading: 4
loading: 11
loading: 0
main running: 5
loading: 5
main running: 1
loading: 1
main running: 7
main running: 10
loading: 10
loading: 7
main running: 9
loading: 9
main running: 8
loading: 8
loading done: 6
exchanging ghost: 6
loading done: 9
exchanging ghost: 9
loading done: 7
exchanging ghost: 7
loading done: 10
loading done: 11
exchanging ghost: 11
loading done: 8
loading done: 3
exchanging ghost: 3
loading done: 4
exchanging ghost: 4
loading done: 5
exchanging ghost: 5
exchanging ghost: 10
loading done: 0
exchanging ghost: 0
loading done: 2
exchanging ghost: 2
loading done: 1
exchanging ghost: 1
exchanging ghost: 8
finished exchanging ghost: 1
finished exchanging ghost: 5
finished exchanging ghost: 0
finished exchanging ghost: 9
finished exchanging ghost: 4
finished exchanging ghost: 6
finished exchanging ghost: 11
finished exchanging ghost: 10
finished exchanging ghost: 7
finished exchanging ghost: 3
finished exchanging ghost: 2
finished exchanging ghost: 8
--------------------------------------------------------------------------
mpiexec.openmpi has exited due to process rank 6 with PID 27419 on

Now the same file with BCAST_DELETE

  std::string l_loadOpts  = "PARALLEL=BCAST_DELETE;";
//  std::string l_loadOpts  = "PARALLEL=READ_PART;";
              l_loadOpts += "PARTITION=PARALLEL_PARTITION;";
              l_loadOpts += "PARALLEL_RESOLVE_SHARED_ENTS;";

No delay in the response time, however rank 2 gets stuck. That's the config I was using in my application:

alex@alex-OptiPlex-9020 ~/moab_debug $ mpiexec.openmpi -n 12 ./a.out 
main running: 0
main running: 1
main running: 2
main running: 3
main running: 4
main running: 5
main running: 7
main running: 6
loading: 7
loading: 5
loading: 0
loading: 1
loading: 2
loading: 4
loading: 3
loading: 6
main running: 11
loading: 11
main running: 8
loading: 8
main running: 10
loading: 10
main running: 9
loading: 9
loading done: 6
exchanging ghost: 6
loading done: 9
exchanging ghost: 9
loading done: 5
loading done: 0
exchanging ghost: 0
loading done: 4
exchanging ghost: 4
loading done: 3
exchanging ghost: 3
loading done: 11
exchanging ghost: 11
loading done: 10
exchanging ghost: 10
exchanging ghost: 5
loading done: 2
exchanging ghost: 2
loading done: 1
exchanging ghost: 1
loading done: 8
exchanging ghost: 8
loading done: 7
exchanging ghost: 7
finished exchanging ghost: 0
finished exchanging ghost: 5
finished exchanging ghost: 1
finished exchanging ghost: 10
finished exchanging ghost: 3
finished exchanging ghost: 9
finished exchanging ghost: 4
finished exchanging ghost: 6
finished exchanging ghost: 8
finished exchanging ghost: 11
finished exchanging ghost: 7
^Cmpiexec.openmpi: killing job...

For the function-call,

m_pcomm.exchange_ghost_cells( m_dim, m_dim-1, 1, m_dim-1, true, true )

this sets

  ghost_dim=3,
  bridge_dim=2,
  num_layers=1,
  addl_entities=2,
  store_remote_handles=true,
  wait_all=true

ErrorCode moab::ParallelComm::exchange_ghost_cells  (   int     ghost_dim,
        int     bridge_dim,
        int     num_layers,
        int     addl_ents,
        bool    store_remote_handles,
        bool    wait_all = true,
        EntityHandle *      file_set = NULL 
    )

Am I missing something obvious?

2016-07-28T20:30:13+00:00

Alex reporter
- attached main.cpp
- 2016-07-28T20:30:32+00:00
Iulian Grindeanu
thanks for reporting; it is indeed a bug somewhere

Also, you cannot use READ_PART for gmsh mesh; BCAST_DELETE is fine

the bug/error seems to be related to bridge dimension; I will investigate more, but if you use bridge dimension 0, it runs to completion, for 12 processors, in my test

thanks for reporting it; bridge dimension 0 will give more elements in the ghost layer, so it should work for your application
- 2016-07-29T18:24:53+00:00
Iulian Grindeanu
also, why are you using ordered set ? It should be MESHSET_SET
- 2016-07-29T19:24:20+00:00
Alex reporter
Glad that you are able to reproduce the behavior.
```
 but if you use bridge dimension 0, it runs to completion, for 12 processors, in my test
```
Unfortunately using 0 as bridge dimension isn't a real option for me since I am relying on faces as bridge in the MPI-layout. Can I circumvent the issue by going through a different mesh reader, e.g. MOAB's HDF5-formant?
```
why are you using ordered set ?
```
Good catch, that's a leftover from the time when I started with the implementation.. Don't need it to be ordered.
- 2016-08-02T00:55:44+00:00
Iulian Grindeanu
if you use bridge dimension = 0 (vertex) , more ghost elements will be part of the ghost layer; but if you use addl_ents=2, all faces in the ghost layers will be created; so you can navigate using face adjacency if you need. I don't understand why do you need bridge dimension 2; bridge dimension is just controlling what ghost elements will be created in the layer;

hdf5 format offers the advantage that it has a parallel reader (READ_PART option is implemented for HDF5), and also is a binary format; if I/O becomes an issue, you could get better performance using hdf5 format.

But it involves an additional step of converting from gmsh format to hdf5 format using mbconvert;
- 2016-08-02T17:35:08+00:00
Alex reporter
```
more ghost elements will be part of the ghost layer
```
Well, that is the problem. I am communicating through faces, not vertices: Using dim=0 as bridge would blow up my communication volume and hit sooner or later in strong scaling.
```
But it involves an additional step of converting from gmsh format to hdf5 format using mbconvert; 
```
Great, sounds simple. Will be become useful at scale.
- 2016-08-02T18:22:07+00:00
Iulian Grindeanu
OK, fair enough; we will try to fix the issue anyway; I thought that the workaround is acceptable for the time being;

strong scaling will hit a plateau anyway if your ghosts layer will be bigger than local mesh; also, it will introduce additional communications when a partition is one-element thick; Still, ghost exchange/ computation is once per solve; is that cost that important compared to the rest? Or do you need to do a lot of remeshings ?
- 2016-08-02T18:57:17+00:00
Vijay M
@iulian07 what is the current status of this ?
- 2016-09-02T04:27:00+00:00
Iulian Grindeanu
this is not fixed yet;

To reproduce easily, without the need of additional driver, use mbconvert, on master build (as of Sep 2)
```
mpiexec -np 12 ../tools/mbconvert -O PARALLEL=BCAST_DELETE -O PARTITION=PARALLEL_PARTITION -O PARALLEL_RESOLVE_SHARED_ENTS -O PARALLEL_GHOSTS=3.2.1.2  mesh_tet4_mpi.msh -o PARALLEL=WRITE_PART oo.h5m
```
Notes: as noted before, with PARALLEL_GHOSTS=3.0.1.2 (bridge dimension 0, not 2), it works. Also, running on 10 processors, instead of 12, works too. It could be a bug in the way ghost entities are sent / duplicated / merged at ghosting. It is hard to debug, because it is reproducible only at a relatively high proc count. DEBUG_PIO option helps in debugging, although it is not clear what processor gets stuck. Alex suggested 2. There is an implicit barrier at end of ghosting.

another note: If we convert first to h5m format, and read in parallel using READ_PART, it behaves the same way; works fine for 10 procs or for bridge dimension 0; it fails for 11 or 12 procs, with bridge 2
```
mpiexec -np 10 ../tools/mbconvert -O PARALLEL=READ_PART -O PARTITION=PARALLEL_PARTITION -O PARALLEL_RESOLVE_SHARED_ENTS -O PARALLEL_GHOSTS=3.2.1.2 -O DEBUG_PIO=1 mm.h5m -o PARALLEL=WRITE_PART oo.h5m
```
- 2016-09-02T11:27:23+00:00
Alex reporter
- marked as major
Any updates by now?

Also increased the priority.
- 2016-10-06T05:39:13+00:00
Vijay M
@breuera We apologize about the delay. @iulian07 I'm adding this bug in the TODO list. We should address this after the upcoming release is finalized.
- 2017-05-20T18:38:29+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: new

Component: –

Version: 4.9.0

Votes: 1

Watchers: 4