GenericVector::init does not prepare ghosts
Øyvind: When a tensor (Vector) is passed to the assemble when assembling CG-forms, the following code produce very strange results in parallel
from dolfin import *
set_log_level(60)
while True:
print "**"
mesh = UnitCubeMesh(6,6,6)
V = FunctionSpace(mesh, "CG", 1)
dun = Function(V)
v = TestFunction(V)
form = Constant(1)
dun.vector()[:] = assemble(inner(form,v)*dx)
print "Check 1: ", assemble(dun*dx)
assemble(inner(form,v)*dx, tensor=dun.vector(), reset_sparsity=False)
print "Check 2: ", assemble(dun*dx)
assemble(inner(form,v)*dx, tensor=dun.vector())
print "Check 3: ", assemble(dun*dx)
# This fails also on master
x = assemble(inner(form,v)*dx)
dun = Function(V, x)
print "Check 4: ", assemble(dun*dx)
Of course, all checks should produce same result, but in parallel, with reset_sparsity=True, this fails apparently random.
I have tested this with two different installation on my laptop, and Martin has been able to reproduce it.
It seems to have been fixed since the 1.3 release, but it seems to be sufficiently critical to report. Perhaps an argument to make another release in the not-so-distant future...
Jan: The actual problem is that GenericVector::init
does not construct ghosted vector because it calls FooVector::resize/init(range)
instead of FooVector::resize/init(range, ghost_indices)
like Function::init_vector()
does. It uses helper function Function::compute_ghosts_indices
. I will check whether it can be factored out of Function
.
1.3.0:
57 /// Initialize zero tensor using sparsity pattern
58 virtual void init(const TensorLayout& tensor_layout)
59 { resize(tensor_layout.local_range(0)); zero(); }
master:
53 /// Initialize zero tensor using sparsity pattern
54 virtual void init(const TensorLayout& tensor_layout)
55 {
56 if (!empty())
57 error("GenericVector cannot be initialised more than once");
58 init(tensor_layout.mpi_comm(), tensor_layout.local_range(0));
59 zero();
60 }
Comments (18)
-
-
- changed milestone to 1.4
-
The backtrace, for the case that it could help. I'm not unfortunately able to track the problem as DOLFIN is too much optimized.
#0 0x00007fffe87aa3c0 in PetscError () from /usr/local/pkg/petsc/3.4.4/gnu-ompi-mkl/lib/libpetsc.so #1 0x00007fffe89617f5 in VecGetValues_MPI () from /usr/local/pkg/petsc/3.4.4/gnu-ompi-mkl/lib/libpetsc.so #2 0x00007fffe8982ddd in VecGetValues () from /usr/local/pkg/petsc/3.4.4/gnu-ompi-mkl/lib/libpetsc.so #3 0x00007fffea940ef9 in dolfin::PETScVector::get_local (this=0x1fc4d80, block=0x1fc8550, m=4, rows=0x1fb5f20) at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/la/PETScVector.cpp:345 #4 0x00007fffea76b1bc in restrict (dolfin_cell=..., w=0x1fc8550, this=0x1fc4c50, element=..., vertex_coordinates=<optimized out>, ufc_cell=...) at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/function/Function.cpp:573 #5 dolfin::Function::restrict(double*, dolfin::FiniteElement const&, dolfin::Cell const&, double const*, ufc::cell const&) const (this=0x1fc4c50, w=0x1fc8550, element=..., dolfin_cell=..., vertex_coordinates=<optimized out>, ufc_cell=...) at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/function/Function.cpp:554 #6 0x00007fffea413fc9 in dolfin::UFC::update (this=0x7fffffff7380, c=..., vertex_coordinates=..., ufc_cell=...) ---Type <return> to continue, or q <return> to quit--- at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/fem/UFC.cpp:139 #7 0x00007fffea40ec62 in dolfin::Assembler::assemble_cells (this=0x1f93f60, A=..., a=..., ufc=..., domains=0x0, values=0x0) at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/fem/Assembler.cpp:166 #8 0x00007fffea4113c8 in dolfin::Assembler::assemble (this=0x1f93f60, A=..., a=...) at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/dolfin/fem/Assembler.cpp:102 #9 0x00007fffc61834f5 in _wrap_Assembler_assemble (args=<optimized out>) at /usr/users/hron/Cluster/pkg/fenics/1.3.0/gnu-ompi-mkl/src/fenics-1.3.0/dolfin/build/dolfin/swig/modules/fem/modulePYTHON_wrap.cxx:25913 #10 0x0000000000551dae in PyEval_EvalFrameEx () #11 0x0000000000575d92 in PyEval_EvalCodeEx () #12 0x000000000054c028 in PyEval_EvalFrameEx () #13 0x0000000000575d92 in PyEval_EvalCodeEx () #14 0x00000000004c1352 in PyRun_SimpleFileExFlags () #15 0x00000000004c754f in Py_Main () #16 0x00007ffff5d0476d in __libc_start_main (main=0x41ba10 <main>, argc=2, ubp_av=0x7fffffff7d48, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffff7d38) at libc-start.c:226 #17 0x000000000041ba41 in _start ()
-
There is probably broken ghost as frame 3 is at line 345
342 // Use VecGetValues if no ghost points, otherwise check for ghost values 343 if (ghost_global_to_local.empty() || m == 0) 344 { 345 ierr = VecGetValues(*_x, _m, _rows, block); 346 if (ierr != 0) petsc_error(ierr, __FILE__, "VecGetValues"); 347 } 348 else
while
m==4
which means thatghost_global_to_local
is empty which should not be. -
The chain is:
-
AssemblerBase::init_global_tensor
-
GenericVector::init
-
PETScVector::resize(std::pair<std::size_t, std::size_t> range)
-
PETScVector::resize(std::pair<std::size_t, std::size_t> range, const std::vector<la_index>& ghost_indices)
but 3. calls 4. with empty
ghost_indices
because it does not know any ghost indices.The thing ends up luckily when
reset_sparsity=False
because assembler does not try to reinitialize the vector so it does not get broken.So the issue is duplicate of
#213. -
-
- changed status to duplicate
Duplicate of
#213. -
- changed title to GenericVector::init does not prepare ghosts
- edited description
-
assigned issue to
- marked as major
- removed milestone
- changed version to dev
-
- changed status to open
-
- edited description
Added case failing on master to Øyvind's example.
-
- edited description
-
This will be resolved by the branch with the switch to local numbering (the ghost values are no longer stored in the PETSc wrapper in the upcoming changes).
-
- changed status to wontfix
Will be fixed by
#119. -
But I wonder whether there should be published an announcement about critical bug in 1.3 which is fixed in 1.4 by
#213(the original issue triggered byAssemblerBase::init_global_tensor
reported by Øyvind). -
- changed status to open
@garth-wells This was not solved. Check 4 still computes random rubish (or raises PETSC_ERR_ARG_OUTOFRANGE 63 with PETSc with debugging).
-
Looks solved to me, but an error message might be possible. The code around check 4 is not correct. The assembler that produces
x
cannot predict howx
will be subsequently used - in this casex
will not have the ghost values that are subsequently required to assembledun*dx
. -
I agree - code is not correct. Though, message would be good.
-
@blechta Could you close this report and open a new, more specific report with just check 4 as the test code?
-
- changed status to closed
Fresh report at
#339. - Log in to comment
With PETSc 3.4.4 compiled with debugging I get PETSc error 63 (input argument, out of range) on third assembly.