GenericVector::init does not prepare ghosts

Issue #291 closed
Øyvind Evju created an issue

Øyvind: When a tensor (Vector) is passed to the assemble when assembling CG-forms, the following code produce very strange results in parallel

from dolfin import *

set_log_level(60)
while True:
    print "**"
    mesh = UnitCubeMesh(6,6,6)

    V = FunctionSpace(mesh, "CG", 1)

    dun = Function(V)
    v = TestFunction(V)

    form = Constant(1)

    dun.vector()[:] =  assemble(inner(form,v)*dx)
    print "Check 1: ", assemble(dun*dx)

    assemble(inner(form,v)*dx, tensor=dun.vector(), reset_sparsity=False)
    print "Check 2: ", assemble(dun*dx)

    assemble(inner(form,v)*dx, tensor=dun.vector())
    print "Check 3: ", assemble(dun*dx)

    # This fails also on master
    x = assemble(inner(form,v)*dx)
    dun = Function(V, x)
    print "Check 4: ", assemble(dun*dx)

Of course, all checks should produce same result, but in parallel, with reset_sparsity=True, this fails apparently random.

I have tested this with two different installation on my laptop, and Martin has been able to reproduce it.

It seems to have been fixed since the 1.3 release, but it seems to be sufficiently critical to report. Perhaps an argument to make another release in the not-so-distant future...

Jan: The actual problem is that GenericVector::init does not construct ghosted vector because it calls FooVector::resize/init(range) instead of FooVector::resize/init(range, ghost_indices) like Function::init_vector() does. It uses helper function Function::compute_ghosts_indices. I will check whether it can be factored out of Function.

1.3.0:

 57     /// Initialize zero tensor using sparsity pattern
 58     virtual void init(const TensorLayout& tensor_layout)
 59     { resize(tensor_layout.local_range(0)); zero(); }

master:

 53     /// Initialize zero tensor using sparsity pattern
 54     virtual void init(const TensorLayout& tensor_layout)
 55     {
 56       if (!empty())
 57         error("GenericVector cannot be initialised more than once");
 58       init(tensor_layout.mpi_comm(), tensor_layout.local_range(0));
 59       zero();
 60     }

Comments (18)

  1. Jan Blechta

    With PETSc 3.4.4 compiled with debugging I get PETSc error 63 (input argument, out of range) on third assembly.

  2. Jan Blechta

    The backtrace, for the case that it could help. I'm not unfortunately able to track the problem as DOLFIN is too much optimized.

    #0  0x00007fffe87aa3c0 in PetscError ()
       from /usr/local/pkg/petsc/3.4.4/gnu-ompi-mkl/lib/libpetsc.so
    #1  0x00007fffe89617f5 in VecGetValues_MPI ()
       from /usr/local/pkg/petsc/3.4.4/gnu-ompi-mkl/lib/libpetsc.so
    #2  0x00007fffe8982ddd in VecGetValues ()
       from /usr/local/pkg/petsc/3.4.4/gnu-ompi-mkl/lib/libpetsc.so
    #3  0x00007fffea940ef9 in dolfin::PETScVector::get_local (this=0x1fc4d80, 
        block=0x1fc8550, m=4, rows=0x1fb5f20)
        at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/la/PETScVector.cpp:345
    #4  0x00007fffea76b1bc in restrict (dolfin_cell=..., w=0x1fc8550, 
        this=0x1fc4c50, element=..., vertex_coordinates=<optimized out>, 
        ufc_cell=...)
        at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/function/Function.cpp:573
    #5  dolfin::Function::restrict(double*, dolfin::FiniteElement const&, dolfin::Cell const&, double const*, ufc::cell const&) const (this=0x1fc4c50, 
        w=0x1fc8550, element=..., dolfin_cell=..., 
        vertex_coordinates=<optimized out>, ufc_cell=...)
        at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/function/Function.cpp:554
    #6  0x00007fffea413fc9 in dolfin::UFC::update (this=0x7fffffff7380, c=..., 
        vertex_coordinates=..., ufc_cell=...)
    ---Type <return> to continue, or q <return> to quit---
        at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/fem/UFC.cpp:139
    #7  0x00007fffea40ec62 in dolfin::Assembler::assemble_cells (this=0x1f93f60, 
        A=..., a=..., ufc=..., domains=0x0, values=0x0)
        at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/fem/Assembler.cpp:166
    #8  0x00007fffea4113c8 in dolfin::Assembler::assemble (this=0x1f93f60, A=..., 
        a=...)
        at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/fem/Assembler.cpp:102
    #9  0x00007fffc61834f5 in _wrap_Assembler_assemble (args=<optimized out>)
        at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/build/dolfin/swig/modules/fem/modulePYTHON_wrap.cxx:25913
    #10 0x0000000000551dae in PyEval_EvalFrameEx ()
    #11 0x0000000000575d92 in PyEval_EvalCodeEx ()
    #12 0x000000000054c028 in PyEval_EvalFrameEx ()
    #13 0x0000000000575d92 in PyEval_EvalCodeEx ()
    #14 0x00000000004c1352 in PyRun_SimpleFileExFlags ()
    #15 0x00000000004c754f in Py_Main ()
    #16 0x00007ffff5d0476d in __libc_start_main (main=0x41ba10 <main>, argc=2, 
        ubp_av=0x7fffffff7d48, init=<optimized out>, fini=<optimized out>, 
        rtld_fini=<optimized out>, stack_end=0x7fffffff7d38) at libc-start.c:226
    #17 0x000000000041ba41 in _start ()
    
  3. Jan Blechta

    There is probably broken ghost as frame 3 is at line 345

    342       // Use VecGetValues if no ghost points, otherwise check for ghost values
    343       if (ghost_global_to_local.empty() || m == 0)
    344       {
    345         ierr = VecGetValues(*_x, _m, _rows, block);
    346         if (ierr != 0) petsc_error(ierr, __FILE__, "VecGetValues");
    347       }
    348       else
    

    while m==4 which means that ghost_global_to_local is empty which should not be.

  4. Jan Blechta

    The chain is:

    1. AssemblerBase::init_global_tensor

    2. GenericVector::init

    3. PETScVector::resize(std::pair<std::size_t, std::size_t> range)

    4. PETScVector::resize(std::pair<std::size_t, std::size_t> range, const std::vector<la_index>& ghost_indices)

    but 3. calls 4. with empty ghost_indices because it does not know any ghost indices.

    The thing ends up luckily when reset_sparsity=False because assembler does not try to reinitialize the vector so it does not get broken.

    So the issue is duplicate of #213.

  5. Prof Garth Wells

    This will be resolved by the branch with the switch to local numbering (the ghost values are no longer stored in the PETSc wrapper in the upcoming changes).

  6. Jan Blechta

    But I wonder whether there should be published an announcement about critical bug in 1.3 which is fixed in 1.4 by #213 (the original issue triggered by AssemblerBase::init_global_tensor reported by Øyvind).

  7. Jan Blechta
    • changed status to open

    @garth-wells This was not solved. Check 4 still computes random rubish (or raises PETSC_ERR_ARG_OUTOFRANGE 63 with PETSc with debugging).

  8. Prof Garth Wells

    Looks solved to me, but an error message might be possible. The code around check 4 is not correct. The assembler that produces x cannot predict how x will be subsequently used - in this case x will not have the ghost values that are subsequently required to assemble dun*dx.

  9. Prof Garth Wells

    @blechta Could you close this report and open a new, more specific report with just check 4 as the test code?

  10. Log in to comment