static assert on sizeof(char*) == sizeof(unique_ptr<char>) fails

Issue #6 closed
Roland Haas created an issue

I get a strange compile error where the assertion (in driver.hxx):

static_assert(sizeof(char *) == sizeof(unique_C_ptr<char>), "");

fails. This is with

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Aug_15_21:14:11_PDT_2021
Cuda compilation tools, release 11.4, V11.4.120
Build cuda_11.4.r11.4/compiler.30300941_0

and

g++ --version
g++ (GCC) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

and I can reduce the issue down to:

#include <memory>
#include <iostream>


// Taken from
// <https://stackoverflow.com/questions/27440953/stdunique-ptr-for-c-functions-that-need-free>
struct free_deleter {
  template <typename T> void operator()(T *p) const {
    std::free(const_cast<std::remove_const_t<T> *>(p));
  }
};
template <typename T> using unique_C_ptr = std::unique_ptr<T, free_deleter>;

static_assert(sizeof(char*) == sizeof(unique_C_ptr<char>));

int main(void) {
  size_t sz_char = sizeof(char *);
  size_t sz_ptr = sizeof(unique_C_ptr<char>);

  std::cout << "sizes: " << sz_char << " ," << sz_ptr << "\n";
  return 0;
}

and

nvcc -ccbin g++ --std=c++17 -x cu ./ptr.cc

As far as I can tell CUDA’s C++ has a 16byte unique_ptr (uses a `__compressed_pair of the actual pointer and the deleter (even when the deleter is a type it seems)) but the host has 8 byte unique_ptr.

Comments (7)

  1. Roland Haas reporter

    A simpler reproducer is:

    #include <memory>
    #include <iostream>
    
    
    // Taken from
    // <https://stackoverflow.com/questions/27440953/stdunique-ptr-for-c-functions-that-need-free>
    struct free_deleter {
      template <typename T> void operator()(T *p) const {
        std::free(const_cast<std::remove_const_t<T> *>(p));
      }
    };
    template <typename T> using unique_C_ptr = std::unique_ptr<T, free_deleter>;
    
    void __device__ tst(void) {
      static_assert(sizeof(char*) == sizeof(unique_C_ptr<char>));
    }
    

    which, when compiled with:

    /home/rhaas/.conda/envs/carpetx/bin/nvcc --verbose --cubin -std=c++17 ptr.cu
    

    gives:

    (carpetx) [rhaas@hybrid Cactus]$ /home/rhaas/.conda/envs/carpetx/bin/nvcc --verbose --cubin -std=c++17 ptr.cu
    #$ _NVVM_BRANCH_=nvvm
    #$ _SPACE_=
    #$ _CUDART_=cudart
    #$ _HERE_=/home/rhaas/.conda/envs/carpetx/bin
    #$ _THERE_=/home/rhaas/.conda/envs/carpetx/bin
    #$ _TARGET_SIZE_=
    #$ _TARGET_DIR_=
    #$ _TARGET_SIZE_=64
    #$ TOP=/home/rhaas/.conda/envs/carpetx/bin/..
    #$ NVVMIR_LIBRARY_DIR=/home/rhaas/.conda/envs/carpetx/bin/../nvvm/libdevice
    #$ LD_LIBRARY_PATH=/home/rhaas/.conda/envs/carpetx/bin/../lib:/usr/local/cuda/lib64:
    #$ PATH=/home/rhaas/.conda/envs/carpetx/bin/../nvvm/bin:/home/rhaas/.conda/envs/carpetx/bin:/home/rhaas/.conda/envs/carpetx/bin:/usr/local/anaconda3/condabin:/usr/local/cuda/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/rhaas/.local/bin:/home/rhaas/bin
    #$ INCLUDES="-I/home/rhaas/.conda/envs/carpetx/bin/../include"
    #$ LIBRARIES=  "-L/home/rhaas/.conda/envs/carpetx/bin/../lib/stubs" "-L/home/rhaas/.conda/envs/carpetx/bin/../lib"
    #$ CUDAFE_FLAGS=
    #$ PTXAS_FLAGS=
    #$ gcc -std=c++17 -D__CUDA_ARCH__=520 -E -x c++  -DCUDA_DOUBLE_MATH_FUNCTIONS -D__CUDACC__ -D__NVCC__  "-I/home/rhaas/.conda/envs/carpetx/bin/../include"    -D__CUDACC_VER_MAJOR__=11 -D__CUDACC_VER_MINOR__=4 -D__CUDACC_VER_BUILD__=120 -D__CUDA_API_VER_MAJOR__=11 -D__CUDA_API_VER_MINOR__=4 -include "cuda_runtime.h" -m64 "ptr.cu" -o "/tmp/tmpxft_00002094_00000000-7_ptr.cpp1.ii"
    #$ cicc --c++17 --gnu_version=110200 --orig_src_file_name "ptr.cu" --allow_managed   -arch compute_52 -m64 --no-version-ident -ftz=0 -prec_div=1 -prec_sqrt=1 -fmad=1 --include_file_name "tmpxft_00002094_00000000-3_ptr.fatbin.c" -tused --gen_module_id_file --module_id_file_name "/tmp/tmpxft_00002094_00000000-4_ptr.module_id" --gen_c_file_name "/tmp/tmpxft_00002094_00000000-6_ptr.cudafe1.c" --stub_file_name "/tmp/tmpxft_00002094_00000000-6_ptr.cudafe1.stub.c" --gen_device_file_name "/tmp/tmpxft_00002094_00000000-6_ptr.cudafe1.gpu"  "/tmp/tmpxft_00002094_00000000-7_ptr.cpp1.ii" -o "/tmp/tmpxft_00002094_00000000-6_ptr.ptx"
    ptr.cu(15): error: static assertion failed
    
    1 error detected in the compilation of "ptr.cu".
    # --error 0x1 --
    

    for

    (carpetx) [rhaas@hybrid Cactus]$ /home/rhaas/.conda/envs/carpetx/bin/nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2021 NVIDIA Corporation
    Built on Sun_Aug_15_21:14:11_PDT_2021
    Cuda compilation tools, release 11.4, V11.4.120
    Build cuda_11.4.r11.4/compiler.30300941_0
    

  2. Erik Schnetter repo owner

    I am using GCC 10.2 when using CUDA, and only CUDA 11.2.2. I haven’t found a combination for more modern versions that work.

  3. Erik Schnetter repo owner

    Under which circumstances are unique_ptrs with deleters passed from host to device? I don’t think that should happen. Are you saying that the static_assert triggers because it it also evaluated on the device? If so, we can probably just wrap it in an #ifndef CUDACC.

  4. Roland Haas reporter

    I added nvcc 11.4 as “not working” to https://bitbucket.org/eschnett/cactusamrex/wiki/CompilerCompatibility and also the statement about gcc 10.2 and CUDA 11.2.2 .

    I have no idea if unique_C_ptris ever passed from host to device. Since the failure is in a static assert I never even managed to compile (much less actually run anything that could then fail). Assuming that the in-memory layout of any class is the same on the device as it is on the host sounds like a recipe for disaster to me though. The only things I would feel comfortable sending would be plain C arrays.

  5. Erik Schnetter repo owner

    We’re also sending the GF3D[25]_Index data structures. These are plain C types (PODs) holding integers and pointers.

  6. Roland Haas reporter

    True. Those need to be passed along.
    The trick would seem to be to ensure that both sides align data in the same manner (ie not one eg aligns doubles on 8 byte boundaries and the other at even bytes, worse if it is controlled via compiler options eg icc’s -align). I can see this being tricky for NVIDIA to get right if they copy the full set of bytes that make up the object. A bit easier if they do element by element copy like C++ does when one assigns to objects, but also slower.

  7. Log in to comment