- edited description
Resizing STL containers like std::vector used in Global Pointers may cause faults
For larger std::vectors
(say with > 1M entries) with reserved capacity much less, processing g_ptr
can sometimes run into errors.
The documentation mentions:
* vector<T,Allocator>, deque<T,Allocator>, list<T,Allocator>:
+ DefinitelySerializable: so long as T is DefinitelySerializable.
I am occasionally noticing faults/crashes when running my UPC++ programs. I still have doubts but my current guess is that when resizing std::vector
using the default std::allocator
could cause problems for other local processes which are accessing this vector using upcxx::local()
(where g_ptr
was fetched only in the beginning).
I would like to next use this vector in remote processes as part of upcxx::rget()
, or may be as argument to upcxx::rpc()
, but the program already crashes with sharing between local processes.
At least in this particular program, I am able to avoid crashes by increasing the std::vector::capacity()
. I am not able to consistently reproduce the crashes/faults in my other UPC++ programs, even with large vectors.
Check the following code snippet, as cannot yet get MWE to reproduce this consistently, but happens often enough in my code to ignore.
template<typename TableKey_T, typename ItemKey_T, typename Msg_T>
class Message {
public:
TableKey_T table;
ItemKey_T item;
Msg_T value;
};
namespace upcxx {
template<class TableKey_T, class ItemKey_T, class Msg_T>
struct is_definitely_trivially_serializable<Message<TableKey_T, ItemKey_T, Msg_T>> : std::true_type {};
template<class TableKey_T, class ItemKey_T, class Msg_T>
struct is_definitely_trivially_serializable<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>> : std::true_type {};
}
// Just creating N vector objects, each for the N processes
upcxx::global_ptr<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>> my_push_buffers_g;
upcxx::dist_object<upcxx::global_ptr<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>>> *my_push_buffers_dist;
my_push_buffers_g = upcxx::new_array<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>>(upcxx::rank_n());
my_push_buffers_dist = new upcxx::dist_object<upcxx::global_ptr<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>>>(my_push_buffers_g);
std::vector<Message<TableKey_T, ItemKey_T, Msg_T>> *my_push_buffers = my_push_buffers_g.local();
for (int i = 0; i < upcxx::rank_n(); i++) {
my_push_buffers[i].reserve(50000);
}
Opening this issue to keep track of any possible triggers, or workarounds, that I am looking into:
std::vector::reserve(1e+7);
- Mark
std::vector
asis_definitely_serializable
(and notis_definitely_trivially_serializable
) - Avoid
std::vector
and instead have:
upcxx::global_ptr<Message<TableKey_T, ItemKey_T, Msg_T>> my_push_buffers_g =
upcxx::new_array<Message<TableKey_T, ItemKey_T, Msg_T>>(1e+7)
Comments (6)
-
reporter -
Are you conflating DefinitelySerializable (which std::vector can be so long as its T is) with DefinitelyTriviallySerializable (which vector definitely cannot be)? Container types like vector cannot be byte copied since they contain pointers into heap buffers holding the T elements. Byte copying such a type is surely an error since the heap buffer pointer will be meaningless to other processes. Consider this: trivially serializing a T means byte-copying sizeof(T) bytes, the number of elements in a vector does not influence the value of sizeof(std::vector<T>) so clearly vectors aren't trivial to copy.
Since rput/rget with a nontrivial T is not supported, you'll have to use global_ptr<Message> derived from upcxx::new_array<Message>(N) much like C programmers use pointers to consecutively stored elements.
Containers like vector are supported as arguments in rpc's, so if containers are important to you there's always that approach, but you'll be sacrificing performance.
-
Another option that preserves the use of
std::vector
is you can replace thestd::allocator
on yourvector<T, Allocator>
with an allocator that allocates from the shared heap, and thenvector::data()
will reference shared memory that can be upcast to a global pointer and shared with other ranks for use in rput/rget. However the usual caveat applies that thevector::data()
pointer only remains valid until an operation that causes a vector capacity resize.If you are interested in this option, the code for that shared-heap container allocator and an example is in upcxx-extras here
-
reporter Thanks @jdbachan, that’s exactly I was suspecting regarding heap buffer pointer.
@bonachea In the cases where this program finished without crashing, for single node execution I noticed significant slow-down when switching between
smp
andibv
backends. On 16 core machine with more than 60 GB RAM, withsmp
it took 2.5 seconds, but withibv
it took 22 seconds.Since my code only used
fetch
in initialization phase, and afterwards solely worked withupcxx::local()
without using anyrget
orrpc
, what could be causing this? Can the issues due to allocator instd::vector
and heap buffer pointers explain this as well?I am going to next benchmark time between
ibv
andsmp
usingupcxx::new_array<Message>(N)
approach to be sure. -
In the cases where this program finished without crashing, for single node execution I noticed significant slow-down when switching between smp and ibv backends. On 16 core machine with more than 60 GB RAM, with smp it took 2.5 seconds, but with ibv it took 22 seconds.
@Aminmkhan : This sounds like an independent issue - please open a new issue for the performance question.
Please be sure to include the output of
upcxx-run -i a.out
for both programs and the relevant source code. -
reporter - changed status to resolved
- Log in to comment
For larger
std::vectors
(say with > 1M entries) with reserved capacity much less, processingg_ptr
can sometimes run into errors.The documentation mentions:
I am occasionally noticing faults/crashes when running my UPC++ programs. I still have doubts but my current guess is that when resizing
std::vector
using the defaultstd::allocator
could cause problems for other local processes who are accessing this vector usingupcxx::local()
(whereg_ptr
was fetched only in the beginning).I would like to next use this vector in remote processes as part of
upcxx::rget()
, or may be as argument toupcxx::rget
, but the program already crashes with sharing between local processes.At least in this particular program, I am able to avoid crashes by increasing the
std::vector::capacity()
. I am not able to consistently reproduce the crashes/faults in my other UPC++ programs, even with large vectors.Check the following code snippet, as cannot yet get MWE to reproduce this consistently, but happens often enough in my code to ignore.
Opening this issue to keep track of any possible triggers, or workarounds, that I am looking into:
std::vector::reserve(1e+7);
std::vector
asis_definitely_serializable
(and notis_definitely_trivially_serializable
)std::vector
and instead have: