Threaded Navier-Stokes benchmark broken

Issue #491 resolved
Jan Blechta created an issue

bench/fem/multicore/cpp/main.cpp segfaults in PETScVector::get_local in omp for loop in OpenMPAssembler::assemble_cells when assembling Navier-Stokes with 2 threads. At least on my machine.

Benchbot seem to be broken there for more then one year as well. I'm not sure whether benchbot logs are accessible somewhere.

Note: Coloring mesh.label in the figure is misleading. The label is extracted as first line from bench/logs/fem-multicore-cpp.log while the actual timing is total running time of the executable.

Comments (20)

  1. Prof Garth Wells

    I think we can remove this benchmark. OpenMPAssembler has a number of problems which affect performance, so benchmarking it is not so interesting.

  2. Jan Blechta reporter

    Maybe that's the reason to see how it performs compared to situation a year ago.

    But more importantly, I have a suspicion that problem would happen even outside of benchmark and it could be worth fixing. Let me check later.

  3. Johannes Ring

    The benchbot is currently broken because it is running an old version of SWIG and we now require SWIG >= 3.0.3. I am working on this.

  4. Johannes Ring

    The benchbot segfaults on this benchmark as well:

    Program received signal SIGSEGV, Segmentation fault.
    [Switching to Thread 0x7fffe6d3a700 (LWP 9719)]
    0x00007ffff39b73df in VecGetArrayRead ()
       from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5
    (gdb) where
    #0  0x00007ffff39b73df in VecGetArrayRead ()
       from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5
    #1  0x00007ffff39c022f in VecGetValues_Seq ()
       from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5
    #2  0x00007ffff39b5c71 in VecGetValues ()
       from /work/src/hashstack/fenics-deps.2015-03-19/lib/libpetsc.so.3.5
    #3  0x00007ffff79d8c9a in dolfin::PETScVector::get_local (this=0x748000, 
        block=0x7fffe0000d40, m=12, rows=0x327c520)
        at /work/fenics/dolfin-dev/src/dolfin/dolfin/la/PETScVector.cpp:288
    #4  0x00007ffff78cd3b1 in restrict (dolfin_cell=..., w=0x7fffe0000d40, this=0x747d70, 
        element=..., vertex_coordinates=<optimized out>, ufc_cell=...)
        at /work/fenics/dolfin-dev/src/dolfin/dolfin/function/Function.cpp:576
    #5  dolfin::Function::restrict(double*, dolfin::FiniteElement const&, dolfin::Cell const&, double const*, ufc::cell const&) const (this=0x747d70, w=0x7fffe0000d40, 
        element=..., dolfin_cell=..., vertex_coordinates=<optimized out>, ufc_cell=...)
        at /work/fenics/dolfin-dev/src/dolfin/dolfin/function/Function.cpp:554
    #6  0x00007ffff77f57b3 in dolfin::UFC::update (this=0x7fffe6d2b580, c=..., 
        vertex_coordinates=..., ufc_cell=..., enabled_coefficients=...)
        at /work/fenics/dolfin-dev/src/dolfin/dolfin/fem/UFC.cpp:149
    #7  0x00007ffff77c565f in dolfin::OpenMpAssembler::assemble_cells(dolfin::GenericTensor&, dolfin::Form const&, dolfin::UFC&, std::shared_ptr<dolfin::MeshFunction<unsigned long> const>, std::vector<double, std::allocator<double> >*) [clone ._omp_fn.2] ()
        at /work/fenics/dolfin-dev/src/dolfin/dolfin/fem/OpenMpAssembler.cpp:215
    #8  0x00007ffff251beea in ?? () from /usr/lib/x86_64-linux-gnu/libgomp.so.1
    #9  0x00007ffff6a48e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
    #10 0x00007ffff625f8bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
    #11 0x0000000000000000 in ?? ()
    (gdb)
    
  5. Prof Garth Wells

    Yes, there is no guarantee that the backends are thread safe. We used an approach that was thread safe in practice with PETSc, but there was no guarantee and PETSc changes could/have changed this.

  6. Johan Hake

    Sorry for polluting the thread. I remembered I played around with OpenMPAssembler and PETSc after PETSc made changes provoking this segfault, to try to make this work in parallel. I figured out that by turning of communication between processes during MatSet/Add, it did not segfault anymore. Probably there is a thread-unsafe implementation of the communication? Sorry for the vague note. But the nice thing is that I actually got it to work in parallel. Of course the assembled matrix was not correct as it lacked the contributions from shared vertices. However, maybe this approach has some merits now that we have the option to use ghost cells. Isn't the need for communication gone during assemble with ghost cell meshes?

  7. Prof Garth Wells

    @johanhake We could now support communication-less assembly, but it isn't implemented yet. Ghosted meshes are optional, but it would simplify things if we always worked with ghosted meshes.

  8. Johan Hake

    @garth-wells, communication-less assembly sounds interesting in itself. Not necessary in the context of OpenMPAssembler which has its own issues of being optimal cache unfriendly.

  9. Johan Hake

    Well, this is really just a side note, from my vague memory of dealing with this 1.5 years ago... Then I did not knew about the thread compile option of PETSc. I just played around with different MatSet/Add options, which I right now cannot recall. What I remembered was that preventing any communication also made it threadsafe, this was also true for serial runs. Anyhow, take this as a somewhat off topic sidenote, as I cannot back it with any substantial information you can use :P

  10. Jan Blechta reporter

    Yes, if nobody is really using it. Shouldn't we ask on the mailing list?

    Consider closing alos #326 in that case.

  11. Log in to comment