Finalizing matrix assembled on mixed 'DG' 'R' FunctionSpace in parallel fails with SIGSEGV

Issue #1059 new
Leon Avery created an issue

I'm working on solution of a PDE system with parameters that vary in time. To represent this, I create a mixed FunctionSpace from 'DG' elements (for the spatial fields) and 'R' elements (for the time-varying parameters), as follows:

import fenics as fe

        SE = fe.FiniteElement('DG', fe.triangle, degree)
        elements = [SE] * (nfields)
        VE = fe.MixedElement(elements)
        PE = fe.VectorElement('R', mesh.ufl_cell(),
                              0, dim=nparams)
        PVSE = VE * PE
        PVS = fe.FunctionSpace(mesh, PVSE)

Using this, I create a Function, TestFunctions and TrialFunctions as follows:

        sol = fe.Function(PVS)
        wUs, wPs = [
            list(tfs) for tfs in fe.TestFunctions(PVS)
        ]
        tdUs, tdPs = [
            list(tfs) for tfs in fe.TrialFunctions(PVS)
        ]

Now I try to assemble a form:

        #
        # assemble the matrix, if necessary (once for all time points)
        #
        dU_integral = sum(
            [tdUi*wUi*fe.dx for tdUi,wUi in zip(tdUs, wUs)]
        )
        dP_integral = sum(
            [tdPi*wPi*fe.dx for tdPi,wPi in zip(tdPs, wPs)]
        )
        A = fe.PETScMatrix()
        fe.assemble(dU_integral + dP_integral,
                    tensor=A, finalize_tensor=False)
        A.apply('insert')

Running sequentially, this works as intended, creating a 983072 x 983072 matrix with a 32 x 32 identity matrix block in the lower right corner. But if I run it with mpiexec -n 2 , it crashes in the apply call with a SIGSEGV. I am attaching an MWE (or really, an MNWE -- a minimal non-working example), mwe3.py, along with logs of its output when run sequentially or in two parallel processes, as follows:

env KSDGDEBUG=ALL mpiexec -n 1 python mwe3.py | & tee mwe3n1.log

...OR...

env KSDGDEBUG=ALL mpiexec -n 2 python mwe3.py | & tee mwe3n2.log​

The two-process run ends as follows:

#!

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 29615 RUNNING AT cpu125
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

This is FEniCS 2018.1.0 (dolfin version 2018.1.0.post1) with PETSc-3.10.2 (built --with-threadsafe and with parallel LU-solvers SuperLU_DIST and MUMPS), both built from source on a linux system (uname -a says: Linux cpu125 2.6.32-754.6.3.el6.x86_64 #1 SMP Tue Oct 9 17:27:49 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux. Everything works when run on a single process, including a much larger program from which this small example was abstracted.

Comments (8)

  1. Sebastian Mitusch

    This looks similar to the bug I've encountered. The following snippet works in serial but not in parallel in dolfin 2018.1.0 (and master branch), while it works for both serial and parallel in 2017.2.0.

    from fenics import *
    
    mesh = UnitCubeMesh(50, 50, 50)
    R = FunctionSpace(mesh, "R", 0)
    assemble(TestFunction(R)*dx)
    

    Interestingly, the problem does not occur for smaller meshes (i.e with a 30 x 30 x 30 Cube mesh).

  2. Leon Avery reporter

    Yes. There is also a size issue in the more complicated example I showed. Within that code there is a line nelements = 128 that determines the size of the DG FunctionSpace. If that is changed to nelement = 16, the example runs correctly both sequentially and on 2 MPi processes.

  3. Log in to comment