Logical vector scalar operators

Issue #286 new
Johannes Czech created an issue

Hello @Klaus Iglberger ,

I created a custom operator overload for logical scalar vector operations:

blaze::DynamicVector<int> x{ 3, 2, 5 };
blaze::DynamicVector<bool> res = x < 4;  // Results in { true, true, false}

The operator is defined in a similar way compared to other scalar vector operators:

template< typename VT  // Type of the left-hand side dense vector
        , bool TF      // Transpose flag of the left-hand side dense vector
        , typename ST  // Type of the right-hand side scalar
        , EnableIf_t< IsNumeric_v<ST> >* = nullptr >
inline decltype(auto) operator<( const DenseVector<VT,TF>& vec, ST scalar )
{
   BLAZE_FUNCTION_TRACE;

   using ScalarType = AddTrait_t< UnderlyingBuiltin_t<VT>, ST >;
   return map( ~vec, blaze::bind2nd( Less2{}, ScalarType( scalar ) ) );
}

First I tried using:

return map( ~vec, blaze::bind2nd( Less{}, ScalarType( scalar ) ) );

as described in:

But this resulted in:

error: use of alias template 'Less' requires template arguments

and Less<VT,VT>{}; resulted in:

error: value is not a member of blaze::DynamicVector<int>

Therefore I defined a new operator called Less2:

//*************************************************************************************************
/*!\brief Generic wrapper for the less operator.
// \ingroup functors
*/
struct Less2
{
   //**********************************************************************************************
   /*!\brief Default constructor of the Less functor.
   */
   explicit inline Less2()
   {}
   //**********************************************************************************************

   //**********************************************************************************************
   /*!\brief Returns the result of the less operator for the given objects/values.
   //
   // \param a The left-hand side object/value.
   // \param b The right-hand side object/value.
   // \return The result of the less operator for the given objects/values.
   */
   template< typename T1, typename T2 >
   BLAZE_ALWAYS_INLINE decltype(auto) operator()( const T1& a, const T2& b ) const
   {
      return a < b;
   }
   //**********************************************************************************************

   //**********************************************************************************************
   /*!\brief Returns whether SIMD is enabled for the specified data types \a T1 and \a T2.
   //
   // \return \a true in case SIMD is enabled for the data types \a T1 and \a T2, \a false if not.
   */
   template< typename T1, typename T2 >
   static constexpr bool simdEnabled() { return HasSIMDAdd_v<T1,T2>; }
   //**********************************************************************************************

   //**********************************************************************************************
   /*!\brief Returns whether the operation supports padding, i.e. whether it can deal with zeros.
   //
   // \return \a true in case padding is supported, \a false if not.
   */
   static constexpr bool paddingEnabled() { return true; }
   //**********************************************************************************************

   //**********************************************************************************************
   /*!\brief Returns the result of the less operation for the given SIMD vectors.
   //
   // \param a The left-hand side SIMD vector.
   // \param b The right-hand side SIMD vector.
   // \return The result of the less operation for the given SIMD vectors.
   */
   template< typename T1, typename T2 >
   BLAZE_ALWAYS_INLINE decltype(auto) load( const T1& a, const T2& b ) const
   {
      BLAZE_CONSTRAINT_MUST_BE_SIMD_PACK( T1 );
      BLAZE_CONSTRAINT_MUST_BE_SIMD_PACK( T2 );
      return a < b;
   }
   //**********************************************************************************************
};

Is there a proper way to use the original function blaze::Less in this context?

Do you think adding operator overloads for

  • operator==( const DenseVector<VT,TF>& vec, ST scalar )
  • operator==( ST scalar , const DenseVector<VT,TF>& vec)
  • operator!=( const DenseVector<VT,TF>& vec, ST scalar )
  • operator!=( ST scalar , const DenseVector<VT,TF>& vec)
  • operator<( const DenseVector<VT,TF>& vec, ST scalar )
  • operator<( ST scalar , const DenseVector<VT,TF>& vec)
  • operator>( const DenseVector<VT,TF>& vec, ST scalar )
  • operator>( ST scalar , const DenseVector<VT,TF>& vec)
  • operator<=( const DenseVector<VT,TF>& vec, ST scalar )
  • operator<=( ST scalar , const DenseVector<VT,TF>& vec)
  • operator>=( const DenseVector<VT,TF>& vec, ST scalar )
  • operator>=( ST scalar , const DenseVector<VT,TF>& vec)

is beneficial for the Blaze library in general?

I can also create a pull request if you like.

This issue is related to:

Best regards,

~Johannes Czech

Comments (7)

  1. Klaus Iglberger

    Hi Johannes!

    The following code snippet shows a working implementation for an operator<() between dense vectors and scalars:

    template< typename VT  // Type of the left-hand side dense vector
            , bool TF      // Transpose flag of the left-hand side dense vector
            , typename ST  // Type of the right-hand side scalar
            , EnableIf_t< IsNumeric_v<ST> >* = nullptr >
    inline decltype(auto) operator<( const DenseVector<VT,TF>& vec, ST scalar )
    {
       BLAZE_FUNCTION_TRACE;
    
       return map( ~vec, blaze::bind2nd( Less{}, scalar ) );
    }
    

    Please note, however, that Less does (for good reasons; see below) not provide any vectorization. Therefore this operation would perform scalar comparisons between all vector elements. Due to the potentially significant number of conditionals this can have a negative impact on your overall performance.

    Unfortunately, your Less2 implementation does not solve the problem. The following implementation tries to point out the most important issues:

    struct Less2
    {
       explicit inline Less2()
       {}
    
       template< typename T1, typename T2 >
       BLAZE_ALWAYS_INLINE std::common_type<T1,T2> operator()( const T1& a, const T2& b ) const  // <- 1. Return type must match the type of the arguments
       {
          return a < b;
       }
    
       template< typename T1, typename T2 >
       static constexpr bool simdEnabled() { return HasSIMDLess_v<T1,T2>; }  // <- 2. Should built on HasSIMDLess, which doesn't yet exist
    
       static constexpr bool paddingEnabled() { return true; }
    
       template< typename T1, typename T2 >
       BLAZE_ALWAYS_INLINE decltype(auto) load( const T1& a, const T2& b ) const
       {
          BLAZE_CONSTRAINT_MUST_BE_SIMD_PACK( T1 );
          BLAZE_CONSTRAINT_MUST_BE_SIMD_PACK( T2 );
          return a < b;   // <- 3. Requires a SIMD 'operator<()', which doesn't yet exist
       }
    };
    

    1. Based on the return type of operator(), the map() function determines its own ElementType, which is later used to determine whether vectorization can be applied. In case operator() returns bool vectorization would only be selected when assigned to a vector of bools (or more specifically 1-byte integral elements). That, however, cannot be realized in the load() function. For instance, for the comparison of two double values the operator would have to return double as well. Assuming AVX, a SIMD vector of double would contain 4 values and a less-than comparison would yield 4 results. This can only be represented in an AVX SIMD vector of type double, which requires the operator() to also return double. Hence in the code snippet use std::common_type to highlight the issue. From a performance point of view and in order to enable vectorization, operator() must return the a type matching the arguments, but from a logical point of view it should of course return bool. This is a generally unsolved problem in Blaze (and potentially even beyond).
    2. The simdEnabled function should of course build on a HasSIMDLess type trait. This type trait does not yet exist in Blaze as a consequence of point 1.
    3. The load() function builds on the operator<() for SIMD vectors. This operator is unfortunately not yet provided by Blaze (also as a consequence of point 1).

    In summary, the load() function of your Less2 implementation will only be called during the assignment to a vector of `bool`, but would not compile due to the missing SIMD implementation of operator<().

    Introducing element-wise relational operators is unfortunately an intrinsically difficult problem. If it wouldn’t be, these operators would have been introduced in the context of issue #255. Yet of course these operations are desirable. Therefore we have already started thinking about possible solutions to this problem.

    I hope that this sums up the problem and the current state well enough, but also shows that we are already investing some effort to find a solution.

    Best regards,

    Klaus!

  2. Johannes Czech reporter

    Thank you @Klaus Iglberger for your detailed response.

    I wasn't aware that element-wise relational operators is a difficult problem for vectorization on CPU.

    Besides that, I realized why I encountered the building errors: the compiler previously tried to include Less{} from:

    Now, I just pulled from origin/master and Less{} is defined at

    and the code is building as expected.

    Moreover, I wrote a small benchmark which compares the runtime of applying vector addition on the full vector and a view of it.

    const int size = 40;
    size_t it = 1e7;
    
    blaze::DynamicVector<float> blaze_vec(size);
    
    const std::initializer_list<size_t> list{ 0UL, 2UL, 4UL, 7UL };
    auto e = elements(blaze_vec, list);
    
    
    std::chrono::steady_clock::time_point start_blaze = std::chrono::steady_clock::now();
    for (size_t i = 0; i < it; ++i) {
        blaze_vec += blaze_vec;
    }
    std::chrono::steady_clock::time_point end_blaze = std::chrono::steady_clock::now();
    std::cout << "Elapsed time blaze_vec:\t" << std::chrono::duration_cast<std::chrono::milliseconds>(end_blaze - start_blaze).count() << "ms" << std::endl;
    
    std::chrono::steady_clock::time_point start_blaze_e = std::chrono::steady_clock::now();
    for (size_t i = 0; i < it; ++i) {
        e += e;
    }
    std::chrono::steady_clock::time_point end_blaze_e = std::chrono::steady_clock::now();
    std::cout << "Elapsed time blaze_e:\t" << std::chrono::duration_cast<std::chrono::milliseconds>(end_blaze_e - start_blaze_e).count() << "ms" << std::endl;
    

    Surprisingly, it took more time to perform the operation on the vector view than on the whole vector itself.

    Elapsed time blaze_vec: 219ms
    Elapsed time blaze_e:   229ms
    

    Presumably this is because of the unfavourable memory alignment of the view of the vector.

    I might reformulate parts of the project code (https://github.com/QueensGambit/CrazyAra-Engine) into singular value operations.
    Therefore, supporting vectorized relational vector operations in Blaze isn't urgent in my case.

    Best regards
    ~Johannes Czech

  3. Log in to comment