Introduce custom operations for vectors and matrices

Klaus Iglberger reporter

changed status to resolved

Summary

The feature has been implemented, tested, optimized (including vectorization and parallelization) and documented as required. It is immediately available via cloning the Blaze repository and will be officially released in Blaze 3.0.

Custom Operations

In addition to the provided operations on vectors and matrices it is possible to define custom operations. For this purpose, Blaze provides the forEach() function, which allows to pass the required operation via functor or lambda:

blaze::DynamicMatrix<double> A, B;

B = forEach( A, []( double d ){ return std::sqrt( d ); } );

This example demonstrates the most convenient way of defining a custom operation by passing a lambda to the forEach() function. The lambda is executed on each single element of a dense vector or matrix or each non-zero element of a sparse vector or matrix.

Alternatively, it is possible to pass a custom functor:

struct Sqrt
{
   double operator()( double a ) const
   {
      return std::sqrt( a );
   }
};

B = forEach( A, Sqrt() );

In order for the functor to work in a call to forEach() it must define a function call operator, which accepts arguments of the type of the according vector or matrix elements.

Although the operation is automatically parallelized depending on the size of the vector or matrix, no automatic vectorization is possible. In order to enable vectorization, a load() function can be added to the functor, which handles the vectorized computation. Depending on the data type this function is passed one of the following Blaze SIMD data types:

SIMD data types for fundamental data types
- blaze::simd_int8_t: Packed SIMD type for 8-bit integral data types
- blaze::simd_int16_t: Packed SIMD type for 16-bit integral data types
- blaze::simd_int32_t: Packed SIMD type for 32-bit integral data types
- blaze::simd_int64_t: Packed SIMD type for 64-bit integral data types
- blaze::simd_float_t: Packed SIMD type for single precision floating point data
- blaze::simd_double_t: Packed SIMD type for double precision floating point data
SIMD data types for complex data types
- blaze::simd_cint8_t: Packed SIMD type for complex 8-bit integral data types
- blaze::simd_cint16_t: Packed SIMD type for complex 16-bit integral data types
- blaze::simd_cint32_t: Packed SIMD type for complex 32-bit integral data types
- blaze::simd_cint64_t: Packed SIMD type for complex 64-bit integral data types
- blaze::simd_cfloat_t: Packed SIMD type for complex single precision floating point data
- blaze::simd_cdouble_t: Packed SIMD type for complex double precision floating point data

All SIMD types provide the value data member for a direct access to the underlying intrinsic data element. In the following example, this intrinsic element is passed to the AVX function _mm256_sqrt_pd():

struct Sqrt
{
   double operator()( double a ) const
   {
      return std::sqrt( a );
   }

   simd_double_t load( simd_double_t a ) const
   {
      return _mm256_sqrt_pd( a.value );
   }
};

In this example, whenever vectorization is generally applicable, the load() function is called instead of the function call operator for as long as the number of remaining elements is larger-or-equal to the width of the packed SIMD type. In all other cases (which also includes peel-off and remainder loops) the scalar operation is used.

Please note that this example has two drawbacks: First, it will only compile in case the intrinsic _mm256_sqrt_pd() function is available (i.e. when AVX is active). Second, the availability of AVX is not taken into account. The first drawback can be alleviated by making the load() function a function template. The second drawback can be dealt with by adding a simdEnabled() function template to the functor:

struct Sqrt
{
   double operator()( double a ) const
   {
      return std::sqrt( a );
   }

   template< typename T >
   T load( T a ) const
   {
      return _mm256_sqrt_pd( a.value );
   }

   template< typename T >
   static constexpr bool simdEnabled() {
#if defined(__AVX__)
      return true;
#else
      return false;
#endif
   }
};

The simdEnabled() function must be a static, constexpr function and must return whether or not vectorization is available for the given data type T. In case the function returns true, the load() function is used for a vectorized evaluation, in case the function returns false, load() is not called.

Note that this is a simplified example that is only working when used for dense vectors and matrices with double precision floating point elements. The following code shows the complete implementation of the according functor that is used within the Blaze library. The Blaze Sqrt functor is working for all data types that are providing a square root operation:

namespace blaze {

struct Sqrt
{
   template< typename T >
   BLAZE_ALWAYS_INLINE auto operator()( const T& a ) const -> decltype( sqrt( a ) )
   {
      return sqrt( a );
   }

   template< typename T >
   static constexpr bool simdEnabled() { return HasSIMDSqrt<T>::value; }

   template< typename T >
   BLAZE_ALWAYS_INLINE auto load( const T& a ) const -> decltype( sqrt( a ) )
   {
      BLAZE_CONSTRAINT_MUST_BE_SIMD_TYPE( T );
      return sqrt( a );
   }
};

} // namespace blaze

For more information on the available Blaze SIMD data types and functions, please see the SIMD module in the complete Blaze documentation.

2016-06-18T11:05:06+00:00

Comments (3)

Klaus Iglberger reporter
- assigned issue to
  
  Klaus Iglberger
- 2016-05-10T11:28:18+00:00
Klaus Iglberger reporter
- changed status to open
- 2016-05-10T11:28:24+00:00
Klaus Iglberger reporter
- changed status to resolved
Summary

The feature has been implemented, tested, optimized (including vectorization and parallelization) and documented as required. It is immediately available via cloning the Blaze repository and will be officially released in Blaze 3.0.

Custom Operations

In addition to the provided operations on vectors and matrices it is possible to define custom operations. For this purpose, Blaze provides the forEach() function, which allows to pass the required operation via functor or lambda:
```
blaze::DynamicMatrix<double> A, B;

B = forEach( A, []( double d ){ return std::sqrt( d ); } );
```
This example demonstrates the most convenient way of defining a custom operation by passing a lambda to the forEach() function. The lambda is executed on each single element of a dense vector or matrix or each non-zero element of a sparse vector or matrix.

Alternatively, it is possible to pass a custom functor:
```
struct Sqrt
{
   double operator()( double a ) const
   {
      return std::sqrt( a );
   }
};

B = forEach( A, Sqrt() );
```
In order for the functor to work in a call to forEach() it must define a function call operator, which accepts arguments of the type of the according vector or matrix elements.

Although the operation is automatically parallelized depending on the size of the vector or matrix, no automatic vectorization is possible. In order to enable vectorization, a load() function can be added to the functor, which handles the vectorized computation. Depending on the data type this function is passed one of the following Blaze SIMD data types:
- SIMD data types for fundamental data types
  - blaze::simd_int8_t: Packed SIMD type for 8-bit integral data types
  - blaze::simd_int16_t: Packed SIMD type for 16-bit integral data types
  - blaze::simd_int32_t: Packed SIMD type for 32-bit integral data types
  - blaze::simd_int64_t: Packed SIMD type for 64-bit integral data types
  - blaze::simd_float_t: Packed SIMD type for single precision floating point data
  - blaze::simd_double_t: Packed SIMD type for double precision floating point data
- SIMD data types for complex data types
  - blaze::simd_cint8_t: Packed SIMD type for complex 8-bit integral data types
  - blaze::simd_cint16_t: Packed SIMD type for complex 16-bit integral data types
  - blaze::simd_cint32_t: Packed SIMD type for complex 32-bit integral data types
  - blaze::simd_cint64_t: Packed SIMD type for complex 64-bit integral data types
  - blaze::simd_cfloat_t: Packed SIMD type for complex single precision floating point data
  - blaze::simd_cdouble_t: Packed SIMD type for complex double precision floating point data
All SIMD types provide the value data member for a direct access to the underlying intrinsic data element. In the following example, this intrinsic element is passed to the AVX function _mm256_sqrt_pd():
```
struct Sqrt
{
   double operator()( double a ) const
   {
      return std::sqrt( a );
   }

   simd_double_t load( simd_double_t a ) const
   {
      return _mm256_sqrt_pd( a.value );
   }
};
```
In this example, whenever vectorization is generally applicable, the load() function is called instead of the function call operator for as long as the number of remaining elements is larger-or-equal to the width of the packed SIMD type. In all other cases (which also includes peel-off and remainder loops) the scalar operation is used.

Please note that this example has two drawbacks: First, it will only compile in case the intrinsic _mm256_sqrt_pd() function is available (i.e. when AVX is active). Second, the availability of AVX is not taken into account. The first drawback can be alleviated by making the load() function a function template. The second drawback can be dealt with by adding a simdEnabled() function template to the functor:
```
struct Sqrt
{
   double operator()( double a ) const
   {
      return std::sqrt( a );
   }

   template< typename T >
   T load( T a ) const
   {
      return _mm256_sqrt_pd( a.value );
   }

   template< typename T >
   static constexpr bool simdEnabled() {
#if defined(__AVX__)
      return true;
#else
      return false;
#endif
   }
};
```
The simdEnabled() function must be a static, constexpr function and must return whether or not vectorization is available for the given data type T. In case the function returns true, the load() function is used for a vectorized evaluation, in case the function returns false, load() is not called.

Note that this is a simplified example that is only working when used for dense vectors and matrices with double precision floating point elements. The following code shows the complete implementation of the according functor that is used within the Blaze library. The Blaze Sqrt functor is working for all data types that are providing a square root operation:
```
namespace blaze {

struct Sqrt
{
   template< typename T >
   BLAZE_ALWAYS_INLINE auto operator()( const T& a ) const -> decltype( sqrt( a ) )
   {
      return sqrt( a );
   }

   template< typename T >
   static constexpr bool simdEnabled() { return HasSIMDSqrt<T>::value; }

   template< typename T >
   BLAZE_ALWAYS_INLINE auto load( const T& a ) const -> decltype( sqrt( a ) )
   {
      BLAZE_CONSTRAINT_MUST_BE_SIMD_TYPE( T );
      return sqrt( a );
   }
};

} // namespace blaze
```
For more information on the available Blaze SIMD data types and functions, please see the SIMD module in the complete Blaze documentation.
- 2016-06-18T11:05:06+00:00
Log in to comment

Introduce custom operations for vectors and matrices

Description

Conceptual Example

Tasks

Comments (3)

Summary

Custom Operations