Performance problems with openmp/threads
When openmp is enabled to allo using the lib in multiple threads the multithreaded performance suffers from pointless #omp criticals in misc.h:m4ri_mm_calloc/m4ri_mm_malloc - removing those and undefining __M4RI_ENABLE_MMC increases the per-thread performance from 1/3 to about 2/3 of the single-threaded version.
Unfortunately the #omp critical pragmas in mzd_t_malloc and mzd_t_free cannot be "fixed" by just removing them, so those functions still remain a massive waste of valuable cpu cycles.
Comments (4)
-
Account Deleted -
repo owner Hey, great that you're interested in making our OpenMP support better which as you noticed, sucks badly. Which algorithm are you referring to? Your own or something we implemented? I played a bit with OpenMP in matrix-matric multiplication myself yesterday and managed to get a 1.42 speed-up using two cores on my quadcore i7. In any case, shouldn't we move this to [m4ri-devel]?
-
Account Deleted Basically, I spawn one thread per physical core, and each one of those independent threads uses m4ri to work on its own problem, but when M4RI_ENABLE_MMC is defined, all threads will keep blocking each other due to the #omp criticals. I guess it's just a matter of unfortunate thread scheduling, because depending on what those threads do it either happens or it does not. This was obviously a lot worse in the 20111203 release than it is now.
As far as I'm concerned this issue is either fixed or not a real issue in the first place because it depends on how m4ri is used.
-
repo owner - changed status to resolved
- Log in to comment
Whoops, didn't notice https://bitbucket.org/malb/m4ri/changeset/194e2b2e55a6 Current hg checkout performs almost as fast as the single-threaded non-openmp version, but ONLY when I undefine M4RI_ENABLE_MMC - if I leave it defined, it's still incredibly slow.