static assert on sizeof(char*) == sizeof(unique_ptr<char>) fails
I get a strange compile error where the assertion (in driver.hxx):
static_assert(sizeof(char *) == sizeof(unique_C_ptr<char>), "");
fails. This is with
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Aug_15_21:14:11_PDT_2021
Cuda compilation tools, release 11.4, V11.4.120
Build cuda_11.4.r11.4/compiler.30300941_0
and
g++ --version
g++ (GCC) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
and I can reduce the issue down to:
#include <memory>
#include <iostream>
// Taken from
// <https://stackoverflow.com/questions/27440953/stdunique-ptr-for-c-functions-that-need-free>
struct free_deleter {
template <typename T> void operator()(T *p) const {
std::free(const_cast<std::remove_const_t<T> *>(p));
}
};
template <typename T> using unique_C_ptr = std::unique_ptr<T, free_deleter>;
static_assert(sizeof(char*) == sizeof(unique_C_ptr<char>));
int main(void) {
size_t sz_char = sizeof(char *);
size_t sz_ptr = sizeof(unique_C_ptr<char>);
std::cout << "sizes: " << sz_char << " ," << sz_ptr << "\n";
return 0;
}
and
nvcc -ccbin g++ --std=c++17 -x cu ./ptr.cc
As far as I can tell CUDA’s C++ has a 16byte unique_ptr
(uses a `__compressed_pair
of the actual pointer and the deleter (even when the deleter is a type it seems)) but the host has 8 byte unique_ptr
.
Comments (7)
-
reporter -
repo owner I am using GCC 10.2 when using CUDA, and only CUDA 11.2.2. I haven’t found a combination for more modern versions that work.
-
repo owner Under which circumstances are
unique_ptr
s with deleters passed from host to device? I don’t think that should happen. Are you saying that thestatic_assert
triggers because it it also evaluated on the device? If so, we can probably just wrap it in an#ifndef CUDACC
. -
reporter I added nvcc 11.4 as “not working” to https://bitbucket.org/eschnett/cactusamrex/wiki/CompilerCompatibility and also the statement about gcc 10.2 and CUDA 11.2.2 .
I have no idea if
unique_C_ptr
is ever passed from host to device. Since the failure is in a static assert I never even managed to compile (much less actually run anything that could then fail). Assuming that the in-memory layout of any class is the same on the device as it is on the host sounds like a recipe for disaster to me though. The only things I would feel comfortable sending would be plain C arrays. -
repo owner We’re also sending the
GF3D[25]_Index
data structures. These are plain C types (PODs) holding integers and pointers. -
reporter True. Those need to be passed along.
The trick would seem to be to ensure that both sides align data in the same manner (ie not one eg aligns doubles on 8 byte boundaries and the other at even bytes, worse if it is controlled via compiler options eg icc’s-align
). I can see this being tricky for NVIDIA to get right if they copy the full set of bytes that make up the object. A bit easier if they do element by element copy like C++ does when one assigns to objects, but also slower. -
reporter - changed status to closed
- Log in to comment
A simpler reproducer is:
which, when compiled with:
gives:
for