Resizing STL containers like std::vector used in Global Pointers may cause faults

Issue #215 resolved
Amin M. Khan created an issue

For larger std::vectors (say with > 1M entries) with reserved capacity much less, processing g_ptr can sometimes run into errors.

The documentation mentions:

  * vector<T,Allocator>, deque<T,Allocator>, list<T,Allocator>:
    + DefinitelySerializable: so long as T is DefinitelySerializable.

I am occasionally noticing faults/crashes when running my UPC++ programs. I still have doubts but my current guess is that when resizing std::vector using the default std::allocator could cause problems for other local processes which are accessing this vector using upcxx::local() (where g_ptr was fetched only in the beginning).

I would like to next use this vector in remote processes as part of upcxx::rget(), or may be as argument to upcxx::rpc(), but the program already crashes with sharing between local processes.

At least in this particular program, I am able to avoid crashes by increasing the std::vector::capacity(). I am not able to consistently reproduce the crashes/faults in my other UPC++ programs, even with large vectors.

Check the following code snippet, as cannot yet get MWE to reproduce this consistently, but happens often enough in my code to ignore.

template<typename TableKey_T, typename ItemKey_T, typename Msg_T>
class Message {
  public:
    TableKey_T table;
    ItemKey_T item;
    Msg_T value;
};

namespace upcxx {
template<class TableKey_T, class ItemKey_T, class Msg_T>
struct is_definitely_trivially_serializable<Message<TableKey_T, ItemKey_T, Msg_T>> : std::true_type {};

template<class TableKey_T, class ItemKey_T, class Msg_T>
struct is_definitely_trivially_serializable<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>> : std::true_type {};
}

// Just creating N vector objects, each for the N processes
upcxx::global_ptr<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>> my_push_buffers_g;
upcxx::dist_object<upcxx::global_ptr<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>>> *my_push_buffers_dist;

my_push_buffers_g = upcxx::new_array<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>>(upcxx::rank_n());
my_push_buffers_dist = new upcxx::dist_object<upcxx::global_ptr<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>>>(my_push_buffers_g);
std::vector<Message<TableKey_T, ItemKey_T, Msg_T>> *my_push_buffers = my_push_buffers_g.local();

for (int i = 0; i < upcxx::rank_n(); i++) {
    my_push_buffers[i].reserve(50000);
}

Opening this issue to keep track of any possible triggers, or workarounds, that I am looking into:

  • std::vector::reserve(1e+7);
  • Mark std::vector as is_definitely_serializable (and not is_definitely_trivially_serializable)
  • Avoid std::vector and instead have:
upcxx::global_ptr<Message<TableKey_T, ItemKey_T, Msg_T>> my_push_buffers_g = 
    upcxx::new_array<Message<TableKey_T, ItemKey_T, Msg_T>>(1e+7)

Comments (6)

  1. Amin M. Khan reporter
    • edited description

    For larger std::vectors (say with > 1M entries) with reserved capacity much less, processing g_ptr can sometimes run into errors.

    The documentation mentions:

      * vector<T,Allocator>, deque<T,Allocator>, list<T,Allocator>:
        + DefinitelySerializable: so long as T is DefinitelySerializable.
    

    I am occasionally noticing faults/crashes when running my UPC++ programs. I still have doubts but my current guess is that when resizing std::vector using the default std::allocator could cause problems for other local processes who are accessing this vector using upcxx::local() (where g_ptr was fetched only in the beginning).

    I would like to next use this vector in remote processes as part of upcxx::rget(), or may be as argument to upcxx::rget, but the program already crashes with sharing between local processes.

    At least in this particular program, I am able to avoid crashes by increasing the std::vector::capacity(). I am not able to consistently reproduce the crashes/faults in my other UPC++ programs, even with large vectors.

    Check the following code snippet, as cannot yet get MWE to reproduce this consistently, but happens often enough in my code to ignore.

    class Message {
      public:
        TableKey_T table;
        ItemKey_T item;
        Msg_T value;
    };
    
    namespace upcxx {
    template<class TableKey_T, class ItemKey_T, class Msg_T>
    struct is_definitely_trivially_serializable<Message<TableKey_T, ItemKey_T, Msg_T>> : std::true_type {};
    
    template<class TableKey_T, class ItemKey_T, class Msg_T>
    struct is_definitely_trivially_serializable<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>> : std::true_type {};
    }
    
    // Just creating N vector objects, each for the N processes
    upcxx::global_ptr<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>> my_push_buffers_g;
    upcxx::dist_object<upcxx::global_ptr<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>>> *my_push_buffers_dist;
    
    my_push_buffers_g = upcxx::new_array<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>>(upcxx::rank_n());
    my_push_buffers_dist = new upcxx::dist_object<upcxx::global_ptr<std::vector<Message<TableKey_T, ItemKey_T, Msg_T>>>>(my_push_buffers_g);
    std::vector<Message<TableKey_T, ItemKey_T, Msg_T>> *my_push_buffers = my_push_buffers_g.local();
    
    for (int i = 0; i < upcxx::rank_n(); i++) {
        my_push_buffers[i].reserve(50000);
    }
    

    Opening this issue to keep track of any possible triggers, or workarounds, that I am looking into:

    • std::vector::reserve(1e+7);
    • Mark std::vector as is_definitely_serializable (and not is_definitely_trivially_serializable)
    • Avoid std::vector and instead have:
    upcxx::global_ptr<Message<TableKey_T, ItemKey_T, Msg_T>> my_push_buffers_g = 
        upcxx::new_array<Message<TableKey_T, ItemKey_T, Msg_T>>(1e+7)
    
  2. john bachan

    Are you conflating DefinitelySerializable (which std::vector can be so long as its T is) with DefinitelyTriviallySerializable (which vector definitely cannot be)? Container types like vector cannot be byte copied since they contain pointers into heap buffers holding the T elements. Byte copying such a type is surely an error since the heap buffer pointer will be meaningless to other processes. Consider this: trivially serializing a T means byte-copying sizeof(T) bytes, the number of elements in a vector does not influence the value of sizeof(std::vector<T>) so clearly vectors aren't trivial to copy.

    Since rput/rget with a nontrivial T is not supported, you'll have to use global_ptr<Message> derived from upcxx::new_array<Message>(N) much like C programmers use pointers to consecutively stored elements.

    Containers like vector are supported as arguments in rpc's, so if containers are important to you there's always that approach, but you'll be sacrificing performance.

  3. Dan Bonachea

    Another option that preserves the use of std::vector is you can replace the std::allocator on your vector<T, Allocator> with an allocator that allocates from the shared heap, and then vector::data() will reference shared memory that can be upcast to a global pointer and shared with other ranks for use in rput/rget. However the usual caveat applies that the vector::data() pointer only remains valid until an operation that causes a vector capacity resize.

    If you are interested in this option, the code for that shared-heap container allocator and an example is in upcxx-extras here

  4. Amin M. Khan reporter

    Thanks @jdbachan, that’s exactly I was suspecting regarding heap buffer pointer.

    @bonachea In the cases where this program finished without crashing, for single node execution I noticed significant slow-down when switching between smp and ibv backends. On 16 core machine with more than 60 GB RAM, with smp it took 2.5 seconds, but with ibv it took 22 seconds.

    Since my code only used fetch in initialization phase, and afterwards solely worked with upcxx::local() without using any rget or rpc, what could be causing this? Can the issues due to allocator in std::vector and heap buffer pointers explain this as well?

    I am going to next benchmark time between ibv and smp using upcxx::new_array<Message>(N) approach to be sure.

  5. Dan Bonachea

    In the cases where this program finished without crashing, for single node execution I noticed significant slow-down when switching between smp and ibv backends. On 16 core machine with more than 60 GB RAM, with smp it took 2.5 seconds, but with ibv it took 22 seconds.

    @Aminmkhan : This sounds like an independent issue - please open a new issue for the performance question.

    Please be sure to include the output of upcxx-run -i a.out for both programs and the relevant source code.

  6. Log in to comment