PETScMatrix::init hangs in parallel for rectangular matrices.

Issue #392 resolved
Marco Morandini created an issue

This is a leftover of issue #86. The attached testcase hangs in parallel because the process owning the row assumes that the matrix is serial, the others that the matrix is parallel. I think that the (bogus) test at PETScMatrix.cpp:124

if (row_range.first == 0 && row_range.second == M)

can be fixed in two ways:

1) by checking also the columns, i.e.

if (row_range.first == 0 && row_range.second == M && col_range.first == 0 && col_range.second == N)

2) by following Garth's advice given at the end of https://bitbucket.org/fenics-project/dolfin/pull-request/49/fix-issue-86-dolfin-sparsitypattern-apply/diff, i.e. changing it into

  int comm_size;
  MPI_Comm_size(sparsity_pattern.mpi_comm(), &comm_size);
  if (comm_size == 1)

I can make a pull request if one of the two solutions is acceptable.

Comments (8)

  1. Prof Garth Wells

    Now that objects store a communicator, use dolfin::MPI::size(...) to get the number of processes.

  2. Martin Sandve Alnæs

    This may be the source of other deadlocks where one process continues past a collective matrix operation.

  3. Martin Sandve Alnæs

    It's here:

    https://bitbucket.org/fenics-project/dolfin/pull-request/174/fix-issue-392/diff

    However I still get an error:

    Traceback (most recent call last):
      File "pippo.py", line 12, in <module>
        lD = assemble(LD)
      File "/home/martinal/opt/fenics/dorsal-dev-1410/lib/python2.7/site-packages/dolfin/fem/assembling.py", line 203, in assemble
        assembler.assemble(tensor, dolfin_form)
    RuntimeError: 
    
    *** -------------------------------------------------------------------------
    *** DOLFIN encountered an error. If you are not able to resolve this issue
    *** using the information listed below, you can ask for help at
    ***
    ***     fenics@fenicsproject.org
    ***
    *** Remember to include the error message listed below and, if possible,
    *** include a *minimal* running example to reproduce the error.
    ***
    *** -------------------------------------------------------------------------
    *** Error:   Unable to complete call to function init().
    *** Reason:  Assertion _local_range[_primary_dim].second > _local_range[_primary_dim].first failed.
    *** Where:   This error was encountered inside ../../dolfin/la/SparsityPattern.cpp (line 107).
    *** Process: unknown
    *** 
    *** DOLFIN version: 1.4.0+
    *** Git changeset:  0685c8f1f92e7d743347d7e012e895ec10d4119d
    *** -------------------------------------------------------------------------
    
  4. Martin Sandve Alnæs

    @garth-wells I guess some places assume that each process has a nonzero number of rows?

  5. Log in to comment