VecScatter optimizations in PETSc 3.6 break DMLocalToLocalBegin/End

Issue #100 resolved
Constantine Khrulev created an issue

I expect DMLocalToLocalBegin/End to update all ghosts when called with the same source and destination vectors:

    ierr  = DMLocalToLocalBegin(da, local, INSERT_VALUES, local);CHKERRQ(ierr);
    ierr  = DMLocalToLocalEnd(da, local, INSERT_VALUES, local);CHKERRQ(ierr);

This works in PETSc 3.5.4 and earlier and fails in PETSc 3.6 -- with some parallel domain partitions of the DM object and some DM boundary types (tested with DM_BOUNDARY_PERIODIC).

Please see the stand-alone example program vecscatter-bug.c and its makefile (attached).

This program allocates a local Vec managed by a DM, sets all elements to a constant (0), then sets all locally owned elements to a different constant (1). After calling DMLocalToLocalBegin/End all ghost values should be equal to 1.

Run vecscatter-bug to see that it works with mpiexec -n 1 but fails with mpiexec -n 2. It looks like ghosts are not updated correctly if a DM's number of processors in some dimension is 1. So, mpiexec -n P for a prime P leads to erroneous behavior. In other cases this behavior can be triggered by using -da_processors_y 1.

As far as I can tell vecscatter-bug.c shows the same (correct) results regardless of the DM's domain decomposition when linked to PETSc 3.5.4.

Git bisect tells me that "ef605946e53f07e3ad39ab377ff50dceade0a2df is the first bad commit":

commit ef605946e53f07e3ad39ab377ff50dceade0a2df
Author: Barry Smith <bsmith@mcs.anl.gov>
Date:   Wed Apr 1 14:27:31 2015 -0500

    turned on optimizations to remove unneeded copies to the same memory in VecScatter and hence DMLocalToLocalBegin/End()

As a workaround in our code (PISM) I can update ghosts in a local Vec by copying locally-owned values from a local Vec to a temporary global Vec and then calling DMGlobalToLocalBegin/End to scatter ghost values back to the original local Vec. I don't know if this may lead to poor performance (comments and advice are welcome), so for now we're telling PISM users to install PETSc 3.5.4 instead of 3.6.0.

Comments (4)

  1. Ed Bueler

    Constantine --

    I have an old email from Barry (10/3/14) with title "DMLocalToGlobalBegin/End regression (?) from 3.4.5 to 3.5.2". Is this a different issue?

    Ed

  2. Constantine Khrulev reporter

    Ed --

    The old bug (the one I reported on 10/2/14) affected DMLocalToGlobalBegin/End, this new one affects DMLocalToLocalBegin/End, so I think these are different issues.

    They are related, though: both issues broke PISM because of our (PISM's) unconventional way of using local and global PETSc Vecs.

  3. Log in to comment