CompiledSubDomain and Expression interfaces will not work with subsets of processes

Issue #304 resolved
Martin Sandve Alnæs created an issue

CompiledSubDomain and Expression doesn't have any mesh and therefore no mpi communicator available. That means they have no way of coordinating a jit on just a subset of processors, and that will effectively block the future plan of allowing equations on subsets of processors.

I don't have a solution, just writing down this issue to have it for the record.

Comments (22)

  1. Martin Sandve Alnæs reporter

    And I think this is the solution from Garth:

    If Expressions and SubDomains need to pass messages, then they should have a communicator. We can default to the objects storing MPI_COMM_WORLD, but it should also be possible for a user to pass their own communicator.

  2. Johan Hake

    This is really not a bug, but rather a feature discussion. Right now we have no means of compiling forms on different processes anyhow. We need to extend our own python wrapper of a mpi_comm or add dependency to mpi4py or something similar.

  3. Jan Blechta

    The semantics has to be carefully decided before doing something. Is compile_expression and CompiledSubDomain going to be collective? Now SubDomain and Expression do not communicate at all, do they?

    The approach of locks looked such a natural, without any complicated logic - just JIT using first rank which approaches the request and wait with others to import.

  4. Johan Hake

    @blechta that sounds like the natural solution actually. In reality that is what we do, but instead of letting chance pick the process which does the JITing we now force it to process 0.

  5. Jan Blechta

    I just realized that I currently do something like

    if something:
        sub_domain = CompiledSubDomain('near(x[0], x0) && near(x[1], x1)',
                                       x0=pinpoint.x(),
                                       x1=pinpoint.y())
    else:
        #sub_domain = CompiledSubDomain('false') # issue 301
        sub_domain = CompiledSubDomain('0') # workaround
    
    bc = DirichletBC(V, 0.0, sub_domain, method='pointwise')
    

    which may be very fragile to suggested changes as you're really compiling different code on different process.

  6. Lawrence Mitchell

    So if only one process compiles code for the compiled subdomain you'll definitely have a problem in that situation. If the semantics of compiled subdomains are intended to be collective, then you can add error checking that the code is actually the same on all processes (maybe in some "debug-only" mode).

  7. Martin Sandve Alnæs reporter

    It wasn't intended as a bug, just a reference point for when this becomes a problem in the future. It looks like one of the major pain points if we're ever going to support process groups, but I think it only illustrates a class of problems we'll have to solve to get there.

  8. Martin Sandve Alnæs reporter

    blechta: that code is very fragile today if 'something' is not guaranteed to be equal on all processes.

  9. Jan Blechta

    @martinal No, you're wrong. The code is good. Pinpoint is found on just one process, where something==True, and not comunicated to other processes. This is by purpose.

  10. Martin Sandve Alnæs reporter

    @blechta: I see why now: mpi_jit_decorator doesn't really "compile just on process 0", it just calls jit on process 0, then waits on barrier, then calls jit again on all processes.

    So what your code will do if process 0 is not part of the process group is to rely on the locking mechanism in instant.

  11. Martin Sandve Alnæs reporter

    But... what happens if just a subset of the processes calls mpi barrier with the world comm? Is that safe?

  12. Martin Sandve Alnæs reporter

    @blechta: your code works by what I would call a fragile accident like this:

    Because both 'if something' and 'else' branches create one and only one CompiledSubDomain, they both enter mpi_jit_decorator, and both get to the mpi barrier with comm_world.

    If just one of the processes entered a branch that created a different number of CompiledSubDomain objects, you would have a deadlock with all the others waiting on the barrier.

  13. Jan Blechta

    But... what happens if just a subset of the processes calls mpi barrier with the world comm? Is that safe? This will deadlock.

  14. Jan Blechta

    You're correct. This

    from dolfin import *
    if MPI.rank(mpi_comm_world())==0:
        CompiledSubDomain('x[0]<1e42') 
    

    deadlocks.

  15. Martin Sandve Alnæs reporter

    And that's why CompiledSubDomain and Expression will need a mesh or explicit mpi comm argument to work in a process group setting.

  16. Jan Blechta

    This has nothing to do with CompiledSubDomain or Expression. These DOLFIN objects don't do anything collective. This collectivity is artificial pulled from current (not much thought) semantics of JITing.

  17. Martin Sandve Alnæs reporter

    @blechta: Maybe, but IMO that's splitting hairs, the current implementation of CompiledSubDomain and Expression is effectively implemented as collective operations via mpi_jit and removing the barrier trick there will lead to other problems i.e. the subpar file locking implementation in instant. That will be a problem that needs solving. If it should be solved via mpi syncronization instead of file locking, the mpi_jit needs a communicator, and CompiledSubDomain and Expression will need to provide that, which they currently can't because the user doesn't give them any information.

  18. Jan Blechta

    @martinal Yes, nice summary. The natural behaviour would be

    1. non-collective
    2. every process checks whether
      • a. module exists and imports it
      • b. module is being JITted and waits
      • c. module needs to be JITted and does the job

    When doing 2a., which the most common situation, every rank touches the filesystem. So we are speaking about breaking the natural semantics 1. in order to reducing filesystem load on 2b. Is it worth?

    You will still end up with non-scalability of python import generally and at 2a. (There are solutions like collfs, but complicated. And yes, with this solution you end up at collective behaviour of file-system access.)

  19. Johan Hake

    Since: a6f6758c39f7b3659e9ac632e368fb4aa6af53e1 JIT now supports mpi groups.

    For forms it just uses the mpi_comm from the mesh, and compiles on the local rank 0 process. For Expression and CompiledSubDomain you need to pass the group communicator during construction:

    e = Expression("...", mpi_comm=group_comm)
    subdomain = CompiledSubDomain("...", mpi_comm=group_comm)
    

    We still have some miles to go to integrate proper petsc initialization with mpi groups. This has to be done with pets4py before importing dolfin.

  20. Log in to comment