- edited description
Threaded Navier-Stokes benchmark broken
bench/fem/multicore/cpp/main.cpp
segfaults in PETScVector::get_local
in omp for loop in OpenMPAssembler::assemble_cells
when assembling Navier-Stokes with 2 threads. At least on my machine.
Benchbot seem to be broken there for more then one year as well. I'm not sure whether benchbot logs are accessible somewhere.
Note: Coloring mesh.
label in the figure is misleading. The label is extracted as first line from bench/logs/fem-multicore-cpp.log
while the actual timing is total running time of the executable.
Comments (20)
-
reporter -
I think we can remove this benchmark.
OpenMPAssembler
has a number of problems which affect performance, so benchmarking it is not so interesting. -
reporter Maybe that's the reason to see how it performs compared to situation a year ago.
But more importantly, I have a suspicion that problem would happen even outside of benchmark and it could be worth fixing. Let me check later.
-
reporter -
assigned issue to
- changed milestone to 1.6
-
assigned issue to
-
The benchbot is currently broken because it is running an old version of SWIG and we now require SWIG >= 3.0.3. I am working on this.
-
The benchbot segfaults on this benchmark as well:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffe6d3a700 (LWP 9719)] 0x00007ffff39b73df in VecGetArrayRead () from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5 (gdb) where #0 0x00007ffff39b73df in VecGetArrayRead () from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5 #1 0x00007ffff39c022f in VecGetValues_Seq () from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5 #2 0x00007ffff39b5c71 in VecGetValues () from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5 #3 0x00007ffff79d8c9a in dolfin::PETScVector::get_local (this=0x748000, block=0x7fffe0000d40, m=12, rows=0x327c520) at /work/fenics/dolfin-dev/src/dolfin/dolfin/la/PETScVector.cpp:288 #4 0x00007ffff78cd3b1 in restrict (dolfin_cell=..., w=0x7fffe0000d40, this=0x747d70, element=..., vertex_coordinates=<optimized out>, ufc_cell=...) at /work/fenics/dolfin-dev/src/dolfin/dolfin/function/Function.cpp:576 #5 dolfin::Function::restrict(double*, dolfin::FiniteElement const&, dolfin::Cell const&, double const*, ufc::cell const&) const (this=0x747d70, w=0x7fffe0000d40, element=..., dolfin_cell=..., vertex_coordinates=<optimized out>, ufc_cell=...) at /work/fenics/dolfin-dev/src/dolfin/dolfin/function/Function.cpp:554 #6 0x00007ffff77f57b3 in dolfin::UFC::update (this=0x7fffe6d2b580, c=..., vertex_coordinates=..., ufc_cell=..., enabled_coefficients=...) at /work/fenics/dolfin-dev/src/dolfin/dolfin/fem/UFC.cpp:149 #7 0x00007ffff77c565f in dolfin::OpenMpAssembler::assemble_cells(dolfin::GenericTensor&, dolfin::Form const&, dolfin::UFC&, std::shared_ptr<dolfin::MeshFunction<unsigned long> const>, std::vector<double, std::allocator<double> >*) [clone ._omp_fn.2] () at /work/fenics/dolfin-dev/src/dolfin/dolfin/fem/OpenMpAssembler.cpp:215 #8 0x00007ffff251beea in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1 #9 0x00007ffff6a48e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #10 0x00007ffff625f8bd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #11 0x0000000000000000 in ?? () (gdb)
-
We should remove the benchmark rather than make work of it. It's not thread-safe.
-
reporter What is not thread-safe?
OpenMPAssembler
? -
reporter It could be thread-safety of PETSc,
#326, right? -
Yes, there is no guarantee that the backends are thread safe. We used an approach that was thread safe in practice with PETSc, but there was no guarantee and PETSc changes could/have changed this.
-
Sorry for polluting the thread. I remembered I played around with OpenMPAssembler and PETSc after PETSc made changes provoking this segfault, to try to make this work in parallel. I figured out that by turning of communication between processes during MatSet/Add, it did not segfault anymore. Probably there is a thread-unsafe implementation of the communication? Sorry for the vague note. But the nice thing is that I actually got it to work in parallel. Of course the assembled matrix was not correct as it lacked the contributions from shared vertices. However, maybe this approach has some merits now that we have the option to use ghost cells. Isn't the need for communication gone during assemble with ghost cell meshes?
-
@johanhake We could now support communication-less assembly, but it isn't implemented yet. Ghosted meshes are optional, but it would simplify things if we always worked with ghosted meshes.
-
@garth-wells, communication-less assembly sounds interesting in itself. Not necessary in the context of OpenMPAssembler which has its own issues of being optimal cache unfriendly.
-
reporter Johan, what communication do you mean? It segfaults on one process with 2 threads so there should be no communication.
Garth, @wence suggested here that PETSc can be thread-safe (in a sufficient sense). We need to try this and implement the check if it helps.
For further reading, see thread support in PETSc, skeptical essay on threaded computing, installation instructions.
-
Well, this is really just a side note, from my vague memory of dealing with this 1.5 years ago... Then I did not knew about the thread compile option of PETSc. I just played around with different MatSet/Add options, which I right now cannot recall. What I remembered was that preventing any communication also made it threadsafe, this was also true for serial runs. Anyhow, take this as a somewhat off topic sidenote, as I cannot back it with any substantial information you can use :P
-
reporter - changed milestone to 1.7
Probably duplicate of
#326. -
- removed milestone
Removing milestone: 1.7 (automated comment)
-
Maybe remove
OpenMPAssembler
? -
reporter Yes, if nobody is really using it. Shouldn't we ask on the mailing list?
Consider closing alos
#326in that case. -
- changed status to resolved
Remove OpenMpAsssmbler. Fixes
#246,#326and#491.→ <<cset e6d8ffbe80bc>>
- Log in to comment