gmsh, exchange_ghost_cells freezes

Issue #41 new
Alex created an issue

Observed non-returning exchange_ghost_cells today. This happens when I am reading a gmsh-generated and gmsh-partitioned file with a (maybe too) large number partitions for few elements. Didn't dig any deeper (low priority) but would assume that the main cause is a faulty partitioning (empty partitions, isolated elements, etc). However, I would have expected an error from MOAB.

Here's the code snippet:

  moab::ErrorCode l_error;
MPI_Barrier( MPI_COMM_WORLD ); LOG_DEBUG << "exchanging ghost cells";
  // get ghost entities
#ifdef PP_USE_MPI
  l_error = m_pcomm.exchange_ghost_cells( m_dim, m_dim-1, 1, m_dim-1, true, true ); checkError( l_error );
#endif
LOG_DEBUG << "ex gh cells done"; MPI_Barrier( MPI_COMM_WORLD );
LOG_DEBUG << "freeze..";

and here the output I am getting

2016-07-27 22:52:22,369 1 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 4 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 11 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 7 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 8 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 0 DEBUG exchanging ghost cells
2016-07-27 22:52:22,369 9 DEBUG exchanging ghost cells
2016-07-27 22:52:22,371 6 DEBUG exchanging ghost cells
2016-07-27 22:52:22,370 10 DEBUG exchanging ghost cells
2016-07-27 22:52:22,372 5 DEBUG exchanging ghost cells
2016-07-27 22:52:22,371 3 DEBUG exchanging ghost cells
2016-07-27 22:52:22,370 2 DEBUG exchanging ghost cells
2016-07-27 22:52:22,387 10 DEBUG ex gh cells done
2016-07-27 22:52:22,387 4 DEBUG ex gh cells done
2016-07-27 22:52:22,387 8 DEBUG ex gh cells done
2016-07-27 22:52:22,387 6 DEBUG ex gh cells done
2016-07-27 22:52:22,387 5 DEBUG ex gh cells done
2016-07-27 22:52:22,387 3 DEBUG ex gh cells done
2016-07-27 22:52:22,387 11 DEBUG ex gh cells done
2016-07-27 22:52:22,387 1 DEBUG ex gh cells done
2016-07-27 22:52:22,388 7 DEBUG ex gh cells done
2016-07-27 22:52:22,388 9 DEBUG ex gh cells done
2016-07-27 22:52:22,386 0 DEBUG ex gh cells done
^Cmpiexec: killing job...

Looks like rank 2 is stuck.

Mesh is attached, MOAB-version is 4.9.0 (with issue-38 fixed manually).

Comments (14)

  1. Iulian Grindeanu

    I was not able to reproduce a problem on master branch, with issue 38 fixed; I converted first to an h5m file, that can be read in parallel later; this file has 12 partitions;

    mbconvert ~/Downloads/mesh_tet4_mpi.msh mesh_g.h5m
    then tried various read options

     mpiexec -np 12 mbconvert -O PARALLEL=READ_PART -O PARTITION=PARALLEL_PARTITION -O PARALLEL_RESOLVE_SHARED_ENTS -O PARALLEL_GHOSTS=3.2.2 mesh_g.h5m -o PARALLEL=WRITE_PART oo.h5m
    
    mpiexec -np 12 mbconvert -O PARALLEL=READ_PART -O PARTITION=PARALLEL_PARTITION -O PARALLEL_RESOLVE_SHARED_ENTS -O PARALLEL_GHOSTS=3.2.2.3 mesh_g.h5m -o PARALLEL=WRITE_PART oo.h5m
    

    one note, the fourth parameter is an integer, not a bool (addl_ents, could be 0, default, 1 edges, 2 faces, 3 both edges and faces); I assume that you want to use faces (2), because I see that your bridge dimension is 2, faces.

    Or how are you loading in parallel? Using read_delete option?

  2. Alex reporter

    Iulian, thanks for looking into this.

    I am attaching a reproducer, which should build on your machine. If I am uncommenting line 18, MOAB is able to jump over line 34. However the response time seems way too slow: Maybe there's a timeout somewhere?

    //  std::string l_loadOpts  = "PARALLEL=BCAST_DELETE;";
      std::string l_loadOpts  = "PARALLEL=READ_PART;";
                  l_loadOpts += "PARTITION=PARALLEL_PARTITION;";
                  l_loadOpts += "PARALLEL_RESOLVE_SHARED_ENTS;";
    
    alex@alex-OptiPlex-9020 ~/moab_debug $ mpiexec.openmpi -n 12 ./a.out 
    main running: 2
    main running: 3
    main running: 4
    main running: 6
    main running: 0
    main running: 11
    loading: 6
    loading: 2
    loading: 3
    loading: 4
    loading: 11
    loading: 0
    main running: 5
    loading: 5
    main running: 1
    loading: 1
    main running: 7
    main running: 10
    loading: 10
    loading: 7
    main running: 9
    loading: 9
    main running: 8
    loading: 8
    loading done: 6
    exchanging ghost: 6
    loading done: 9
    exchanging ghost: 9
    loading done: 7
    exchanging ghost: 7
    loading done: 10
    loading done: 11
    exchanging ghost: 11
    loading done: 8
    loading done: 3
    exchanging ghost: 3
    loading done: 4
    exchanging ghost: 4
    loading done: 5
    exchanging ghost: 5
    exchanging ghost: 10
    loading done: 0
    exchanging ghost: 0
    loading done: 2
    exchanging ghost: 2
    loading done: 1
    exchanging ghost: 1
    exchanging ghost: 8
    finished exchanging ghost: 1
    finished exchanging ghost: 5
    finished exchanging ghost: 0
    finished exchanging ghost: 9
    finished exchanging ghost: 4
    finished exchanging ghost: 6
    finished exchanging ghost: 11
    finished exchanging ghost: 10
    finished exchanging ghost: 7
    finished exchanging ghost: 3
    finished exchanging ghost: 2
    finished exchanging ghost: 8
    --------------------------------------------------------------------------
    mpiexec.openmpi has exited due to process rank 6 with PID 27419 on
    

    Now the same file with BCAST_DELETE

      std::string l_loadOpts  = "PARALLEL=BCAST_DELETE;";
    //  std::string l_loadOpts  = "PARALLEL=READ_PART;";
                  l_loadOpts += "PARTITION=PARALLEL_PARTITION;";
                  l_loadOpts += "PARALLEL_RESOLVE_SHARED_ENTS;";
    

    No delay in the response time, however rank 2 gets stuck. That's the config I was using in my application:

    alex@alex-OptiPlex-9020 ~/moab_debug $ mpiexec.openmpi -n 12 ./a.out 
    main running: 0
    main running: 1
    main running: 2
    main running: 3
    main running: 4
    main running: 5
    main running: 7
    main running: 6
    loading: 7
    loading: 5
    loading: 0
    loading: 1
    loading: 2
    loading: 4
    loading: 3
    loading: 6
    main running: 11
    loading: 11
    main running: 8
    loading: 8
    main running: 10
    loading: 10
    main running: 9
    loading: 9
    loading done: 6
    exchanging ghost: 6
    loading done: 9
    exchanging ghost: 9
    loading done: 5
    loading done: 0
    exchanging ghost: 0
    loading done: 4
    exchanging ghost: 4
    loading done: 3
    exchanging ghost: 3
    loading done: 11
    exchanging ghost: 11
    loading done: 10
    exchanging ghost: 10
    exchanging ghost: 5
    loading done: 2
    exchanging ghost: 2
    loading done: 1
    exchanging ghost: 1
    loading done: 8
    exchanging ghost: 8
    loading done: 7
    exchanging ghost: 7
    finished exchanging ghost: 0
    finished exchanging ghost: 5
    finished exchanging ghost: 1
    finished exchanging ghost: 10
    finished exchanging ghost: 3
    finished exchanging ghost: 9
    finished exchanging ghost: 4
    finished exchanging ghost: 6
    finished exchanging ghost: 8
    finished exchanging ghost: 11
    finished exchanging ghost: 7
    ^Cmpiexec.openmpi: killing job...
    

    For the function-call,

    m_pcomm.exchange_ghost_cells( m_dim, m_dim-1, 1, m_dim-1, true, true )
    

    this sets

      ghost_dim=3,
      bridge_dim=2,
      num_layers=1,
      addl_entities=2,
      store_remote_handles=true,
      wait_all=true
    

    in

    ErrorCode moab::ParallelComm::exchange_ghost_cells  (   int     ghost_dim,
            int     bridge_dim,
            int     num_layers,
            int     addl_ents,
            bool    store_remote_handles,
            bool    wait_all = true,
            EntityHandle *      file_set = NULL 
        )
    

    Am I missing something obvious?

  3. Iulian Grindeanu

    thanks for reporting; it is indeed a bug somewhere

    Also, you cannot use READ_PART for gmsh mesh; BCAST_DELETE is fine

    the bug/error seems to be related to bridge dimension; I will investigate more, but if you use bridge dimension 0, it runs to completion, for 12 processors, in my test

    thanks for reporting it; bridge dimension 0 will give more elements in the ghost layer, so it should work for your application

  4. Alex reporter

    Glad that you are able to reproduce the behavior.

     but if you use bridge dimension 0, it runs to completion, for 12 processors, in my test
    

    Unfortunately using 0 as bridge dimension isn't a real option for me since I am relying on faces as bridge in the MPI-layout. Can I circumvent the issue by going through a different mesh reader, e.g. MOAB's HDF5-formant?

    why are you using ordered set ?
    

    Good catch, that's a leftover from the time when I started with the implementation.. Don't need it to be ordered.

  5. Iulian Grindeanu

    if you use bridge dimension = 0 (vertex) , more ghost elements will be part of the ghost layer; but if you use addl_ents=2, all faces in the ghost layers will be created; so you can navigate using face adjacency if you need. I don't understand why do you need bridge dimension 2; bridge dimension is just controlling what ghost elements will be created in the layer;

    hdf5 format offers the advantage that it has a parallel reader (READ_PART option is implemented for HDF5), and also is a binary format; if I/O becomes an issue, you could get better performance using hdf5 format.

    But it involves an additional step of converting from gmsh format to hdf5 format using mbconvert;

  6. Alex reporter
    more ghost elements will be part of the ghost layer
    

    Well, that is the problem. I am communicating through faces, not vertices: Using dim=0 as bridge would blow up my communication volume and hit sooner or later in strong scaling.

    But it involves an additional step of converting from gmsh format to hdf5 format using mbconvert; 
    

    Great, sounds simple. Will be become useful at scale.

  7. Iulian Grindeanu

    OK, fair enough; we will try to fix the issue anyway; I thought that the workaround is acceptable for the time being;

    strong scaling will hit a plateau anyway if your ghosts layer will be bigger than local mesh; also, it will introduce additional communications when a partition is one-element thick; Still, ghost exchange/ computation is once per solve; is that cost that important compared to the rest? Or do you need to do a lot of remeshings ?

  8. Iulian Grindeanu

    this is not fixed yet;

    To reproduce easily, without the need of additional driver, use mbconvert, on master build (as of Sep 2)

    mpiexec -np 12 ../tools/mbconvert -O PARALLEL=BCAST_DELETE -O PARTITION=PARALLEL_PARTITION -O PARALLEL_RESOLVE_SHARED_ENTS -O PARALLEL_GHOSTS=3.2.1.2  mesh_tet4_mpi.msh -o PARALLEL=WRITE_PART oo.h5m
    

    Notes: as noted before, with PARALLEL_GHOSTS=3.0.1.2 (bridge dimension 0, not 2), it works. Also, running on 10 processors, instead of 12, works too. It could be a bug in the way ghost entities are sent / duplicated / merged at ghosting. It is hard to debug, because it is reproducible only at a relatively high proc count. DEBUG_PIO option helps in debugging, although it is not clear what processor gets stuck. Alex suggested 2. There is an implicit barrier at end of ghosting.

    another note: If we convert first to h5m format, and read in parallel using READ_PART, it behaves the same way; works fine for 10 procs or for bridge dimension 0; it fails for 11 or 12 procs, with bridge 2

    mpiexec -np 10 ../tools/mbconvert -O PARALLEL=READ_PART -O PARTITION=PARALLEL_PARTITION -O PARALLEL_RESOLVE_SHARED_ENTS -O PARALLEL_GHOSTS=3.2.1.2 -O DEBUG_PIO=1 mm.h5m -o PARALLEL=WRITE_PART oo.h5m
    
  9. Vijay M

    @breuera We apologize about the delay. @iulian07 I'm adding this bug in the TODO list. We should address this after the upcoming release is finalized.

  10. Log in to comment