Performance problems with openmp/threads

Issue #42 resolved
Former user created an issue

When openmp is enabled to allo using the lib in multiple threads the multithreaded performance suffers from pointless #omp criticals in misc.h:m4ri_mm_calloc/m4ri_mm_malloc - removing those and undefining __M4RI_ENABLE_MMC increases the per-thread performance from 1/3 to about 2/3 of the single-threaded version.

Unfortunately the #omp critical pragmas in mzd_t_malloc and mzd_t_free cannot be "fixed" by just removing them, so those functions still remain a massive waste of valuable cpu cycles.

Comments (4)

  1. Martin Albrecht repo owner

    Hey, great that you're interested in making our OpenMP support better which as you noticed, sucks badly. Which algorithm are you referring to? Your own or something we implemented? I played a bit with OpenMP in matrix-matric multiplication myself yesterday and managed to get a 1.42 speed-up using two cores on my quadcore i7. In any case, shouldn't we move this to [m4ri-devel]?

  2. Former user Account Deleted

    Basically, I spawn one thread per physical core, and each one of those independent threads uses m4ri to work on its own problem, but when M4RI_ENABLE_MMC is defined, all threads will keep blocking each other due to the #omp criticals. I guess it's just a matter of unfortunate thread scheduling, because depending on what those threads do it either happens or it does not. This was obviously a lot worse in the 20111203 release than it is now.

    As far as I'm concerned this issue is either fixed or not a real issue in the first place because it depends on how m4ri is used.

  3. Log in to comment