Threaded Navier-Stokes benchmark broken

Issue #491 resolved

Jan Blechta created an issue 2015-03-18

bench/fem/multicore/cpp/main.cpp segfaults in PETScVector::get_local in omp for loop in OpenMPAssembler::assemble_cells when assembling Navier-Stokes with 2 threads. At least on my machine.

Benchbot seem to be broken there for more then one year as well. I'm not sure whether benchbot logs are accessible somewhere.

Note: Coloring mesh.label in the figure is misleading. The label is extracted as first line from bench/logs/fem-multicore-cpp.log while the actual timing is total running time of the executable.

Comments (20)

Jan Blechta reporter
- edited description
- 2015-03-18T10:05:33+00:00
Prof Garth Wells
I think we can remove this benchmark. OpenMPAssembler has a number of problems which affect performance, so benchmarking it is not so interesting.
- 2015-03-18T13:47:12+00:00
Jan Blechta reporter
Maybe that's the reason to see how it performs compared to situation a year ago.

But more importantly, I have a suspicion that problem would happen even outside of benchmark and it could be worth fixing. Let me check later.
- 2015-03-18T14:15:34+00:00
Jan Blechta reporter
- assigned issue to
  
  Jan Blechta
- changed milestone to 1.6
- 2015-03-18T14:16:45+00:00
Johannes Ring
The benchbot is currently broken because it is running an old version of SWIG and we now require SWIG >= 3.0.3. I am working on this.
- 2015-03-19T11:14:44+00:00

Johannes Ring

The benchbot segfaults on this benchmark as well:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe6d3a700 (LWP 9719)]
0x00007ffff39b73df in VecGetArrayRead ()
   from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5
(gdb) where
#0  0x00007ffff39b73df in VecGetArrayRead ()
   from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5
#1  0x00007ffff39c022f in VecGetValues_Seq ()
   from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5
#2  0x00007ffff39b5c71 in VecGetValues ()
   from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5
#3  0x00007ffff79d8c9a in dolfin::PETScVector::get_local (this=0x748000, 
    block=0x7fffe0000d40, m=12, rows=0x327c520)
    at /work/fenics/dolfin-dev/src/dolfin/dolfin/la/PETScVector.cpp:288
#4  0x00007ffff78cd3b1 in restrict (dolfin_cell=..., w=0x7fffe0000d40, this=0x747d70, 
    element=..., vertex_coordinates=<optimized out>, ufc_cell=...)
    at /work/fenics/dolfin-dev/src/dolfin/dolfin/function/Function.cpp:576
#5  dolfin::Function::restrict(double*, dolfin::FiniteElement const&, dolfin::Cell const&, double const*, ufc::cell const&) const (this=0x747d70, w=0x7fffe0000d40, 
    element=..., dolfin_cell=..., vertex_coordinates=<optimized out>, ufc_cell=...)
    at /work/fenics/dolfin-dev/src/dolfin/dolfin/function/Function.cpp:554
#6  0x00007ffff77f57b3 in dolfin::UFC::update (this=0x7fffe6d2b580, c=..., 
    vertex_coordinates=..., ufc_cell=..., enabled_coefficients=...)
    at /work/fenics/dolfin-dev/src/dolfin/dolfin/fem/UFC.cpp:149
#7  0x00007ffff77c565f in dolfin::OpenMpAssembler::assemble_cells(dolfin::GenericTensor&, dolfin::Form const&, dolfin::UFC&, std::shared_ptr<dolfin::MeshFunction<unsigned long> const>, std::vector<double, std::allocator<double> >*) [clone ._omp_fn.2] ()
    at /work/fenics/dolfin-dev/src/dolfin/dolfin/fem/OpenMpAssembler.cpp:215
#8  0x00007ffff251beea in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#9  0x00007ffff6a48e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007ffff625f8bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x0000000000000000 in ?? ()
(gdb)

2015-03-20T14:12:50+00:00

Prof Garth Wells
We should remove the benchmark rather than make work of it. It's not thread-safe.
- 2015-03-20T14:38:17+00:00
Jan Blechta reporter
What is not thread-safe? OpenMPAssembler?
- 2015-03-20T15:17:12+00:00
Jan Blechta reporter
It could be thread-safety of PETSc, ~~#326~~, right?
- 2015-03-20T15:20:04+00:00
Prof Garth Wells
Yes, there is no guarantee that the backends are thread safe. We used an approach that was thread safe in practice with PETSc, but there was no guarantee and PETSc changes could/have changed this.
- 2015-03-25T08:44:44+00:00
Johan Hake
Sorry for polluting the thread. I remembered I played around with OpenMPAssembler and PETSc after PETSc made changes provoking this segfault, to try to make this work in parallel. I figured out that by turning of communication between processes during MatSet/Add, it did not segfault anymore. Probably there is a thread-unsafe implementation of the communication? Sorry for the vague note. But the nice thing is that I actually got it to work in parallel. Of course the assembled matrix was not correct as it lacked the contributions from shared vertices. However, maybe this approach has some merits now that we have the option to use ghost cells. Isn't the need for communication gone during assemble with ghost cell meshes?
- 2015-03-25T09:41:00+00:00
Prof Garth Wells
@johanhake We could now support communication-less assembly, but it isn't implemented yet. Ghosted meshes are optional, but it would simplify things if we always worked with ghosted meshes.
- 2015-03-25T09:54:05+00:00
Johan Hake
@garth-wells, communication-less assembly sounds interesting in itself. Not necessary in the context of OpenMPAssembler which has its own issues of being optimal cache unfriendly.
- 2015-03-25T09:57:49+00:00
Jan Blechta reporter
Johan, what communication do you mean? It segfaults on one process with 2 threads so there should be no communication.

Garth, @wence suggested here that PETSc can be thread-safe (in a sufficient sense). We need to try this and implement the check if it helps.

For further reading, see thread support in PETSc, skeptical essay on threaded computing, installation instructions.
- 2015-03-25T12:47:11+00:00
Johan Hake
Well, this is really just a side note, from my vague memory of dealing with this 1.5 years ago... Then I did not knew about the thread compile option of PETSc. I just played around with different MatSet/Add options, which I right now cannot recall. What I remembered was that preventing any communication also made it threadsafe, this was also true for serial runs. Anyhow, take this as a somewhat off topic sidenote, as I cannot back it with any substantial information you can use :P
- 2015-03-25T13:00:09+00:00
Jan Blechta reporter
- changed milestone to 1.7
Probably duplicate of ~~#326~~.
- 2015-06-04T12:07:51+00:00
Anders Logg (Chalmers)
- removed milestone
Removing milestone: 1.7 (automated comment)
- 2016-06-29T11:48:06+00:00
Prof Garth Wells
Maybe remove OpenMPAssembler?
- 2017-02-08T10:45:47+00:00
Jan Blechta reporter
Yes, if nobody is really using it. Shouldn't we ask on the mailing list?

Consider closing alos ~~#326~~ in that case.
- 2017-02-08T11:24:54+00:00
Prof Garth Wells
- changed status to resolved
Remove OpenMpAsssmbler. Fixes ~~#246~~, ~~#326~~ and ~~#491~~.

→ <<cset e6d8ffbe80bc>>
- 2017-02-08T11:27:47+00:00
Log in to comment

Assignee: Jan Blechta

Type: bug

Priority: major

Status: resolved

Component: assembly

Milestone: –

Version: dev

Votes: 0

Watchers: 4