thread-safety bug in dist_array::ptr

Issue #366 resolved
Dan Bonachea created an issue

There is a thread-safety bug in dist_array<T,BS,/*thread-safe=*/true>::ptr.

This rarely results in returning bad global_ptr values, as in this example from our CI

The bug is on these lines in dist_array.hpp:

727     global_ptr<T> find(const intrank_t target) {
728         typename std::unordered_map<intrank_t,global_ptr<T>>::const_iterator elem;
729         if (thread_safe) {
730             std::lock_guard<std::recursive_mutex> lg(mut);
731             elem = list.find(target);
732             if (elem == list.end()) return global_ptr<T>(nullptr);
733         } else {
734             elem = list.find(target);
735             if (elem == list.end()) return global_ptr<T>(nullptr);
736         }
737         return elem->second;
738     }
739 

Note that in the thread-safe case, find grabs a mutex and performs a std::unordered_map::find lookup, but the mutex is released before dereferencing the returned const_iterator to retrieve the value. This creates a race with other threads that might concurrently update the unordered_map and invalidate the const_iterator, leading to returning garbage.

Comments (1)

  1. Log in to comment