Excluded CUDA assignable structures from existing smpAssign strategies

#37 Declined
Repository
JPenuchot
Branch
blaze_cuda
Repository
blaze-lib
Branch
master
Author
  1. Jules Pénuchot
Reviewers
Description

Hi Klaus,

This update will allow me to overload smpAssign() to dispatch the assignment towards cudaAssign(). That change is required to make sure operator=() eventually calls cudaAssign() for all CUDA assignable structures (CUDADynamicMatrix, CUDADynamicVector, and also views on these structures)

Regards,

Jules

Comments (1)

  1. Klaus Iglberger

    Hi Jules!

    Thanks a lot for the pull request. The help is highly appreciated. However, it appears as if there is a simpler solution based on the already existing CRTP inheritance hierarchy.

    The following shows the inheritance/abstraction hierarchy for vectors:

    Vector <| DenseVector <| DynamicVector
           \
            \ <|- SparseVector <| ...
    

    The Blaze library provides the following ’smpAssign()’ function. It can be called for every possible kind of vector (including sparse vectors):

    template< typename VT1, bool TF1, typename VT2, bool TF2 >
    void smpAssign( Vector<VT1,TF1>&, const Vector<VT2,TF2>& );  // smpAssign() in the Blaze library
    

    In order to replace this function by another function, it is only necessary to use a more specific kind of vector. For instance:

    template< typename VT1, bool TF1, typename VT2, bool TF2 >
    void smpAssign( DenseVector<VT1,TF1>&, const DenseVector<VT2,TF2>& );  // smpAssign() in Blaze_CUDA
    

    As soon as both functions are visible, the compiler will always pick up the second one. This, however, might not be desired. Some vectors (e.g. DynamicVector) should still bind to the library function in order to use a CPU backend. For this reason, the second ’smpAssign()’ function can be constraint:

    template< typename VT1, bool TF1, typename VT2, bool TF2 >
    auto smpAssign( DenseVector<VT1,TF1>&, const DenseVector<VT2,TF2>& )  // smpAssign() in Blaze_CUDA
       -> EnableIf_t< IsCUDAAssignable_v<VT1> && IsCUDAAssignable_v<VT2> >;Now the CUDA-specific backend will only be called for CUDA-types, the library function is called for all other vector types. This logic can be applied to views as well by specialising the IsCUDAAssignable type trait accordingly.
    

    This discussion ignores namespaces. By the introduction of an additional namespace it is even simpler to explicitly steer which function is called, since functions in the same namespace as the according types are preferred.

    In summary: From this perspective the library function doesn’t have to be constraint since it is too general. Every more specific function can replace the library function. By constraining the more specific function (i.e. the function in the extension!) it is possible to steer which function is called.

    Thanks again,

    Best regards,

    Klaus!