Communicator error for large problem with fields_local

Issue #113 resolved
David Dickinson created an issue

Reported on slack by @Jason Parisi .

Running the attached on JFRS with current master/next (71550dae6931c61cb170a65817fc4c926bf6eaa3) leads to the following error message (duplicated for different ranks). Lower resolution cases run without error and this case runs without error with the implicit fields option.

Rank 2167 [Wed May 27 20:51:57 2020] [c2-0c2s7n3] Fatal error in PMPI_Comm_split: Other MPI error, error stack: PMPI_Comm_split(521)................: MPI_Comm_split(comm=0xc4000022, color=0, key=0, new_comm=0x7fffffff31d4) failed PMPI_Comm_split(503)................: MPIR_Comm_split_impl(276)...........: MPIR_Get_contextid_sparse_group(688): Cannot allocate context ID because of fragmentation (2034/4096 free on this process; ignore_id=0)

Comments (3)

  1. David Dickinson reporter

    This is likely because fields_local currently creates and destroys a potentially large number of communicators as a part of initialisation. PR #307 attempts to improve this by reducing the number of communicators we create which aren’t needed outside of initialisation.

  2. Log in to comment