Functionality to enforce non-lazy evaluation

Issue #72 resolved
Marcin Copik created an issue

I have had some troubles with forcing Blaze to disregard the default lazy evaluation policy and compute results immediately. An example of such situation may be benchmarking where we want to measure execution time of a given operation without introducing overhead from additional operations which enforce evaluation (e.g. access matrix elements).

It works fine as long as we explicitly define an object storing results:

typedef blaze::DynamicMatrix<double, blaze::rowMajor> mat_t;
mat_t result3 = A * blaze::trans(B);

However, I would like to be able to not define types explicitly. For example, if set of input objects include adapters for symmetric or tridiagonal matrices, some of operations may preserve those properties and I suspect that in longer sequences of operations Blaze may be able to follow how matrix properties change, hence Blaze can apply optimizations in further operations involving this matrix. Explicit type definitions means that either the user has to know if a result matrix is symmetric or tridiagonal, which is quite problematic because it requires the user to have the linear algebra knowledge which is already implemented in the library and it makes much harder to implement an algorithm where matrix type is a template parameter, or all result types are defined as DynamicMatrix which may prevent further optimizations.

I have tried:

auto result = A * blaze::trans(B);

which deduces the return parameter as an ET node representing GEMM. Hence no computation is ever performed.

Adding eval function does not change the behaviour:

auto result2 = blaze::eval(A * blaze::trans(B));
blaze::eval(A * blaze::trans(B));

Still no computation is performed.

I'd like to propose a change to blaze::eval function or a completely new functionality which is able to enforce an evaluation of template expression. I'm not sure what is the status of auto keyword in Blaze, I know Eigen advises to not use it most of the time, maybe there is another approach which allows to write kernels preserving special matrix types?

I have checked few other ET libraries - Eigen, Armadillo and MTL4 - all of them have a function enforcing evaluation which works (although it is possible that it introduces an overhead caused by a temporary object, but I haven't observed that in such example).

Source code of example. First three measurements return time equal to zero which indicates that no computation is performed. Only the fourth attempt computes matrix product.

Comments (9)

  1. Klaus Iglberger

    Hi Marcin!

    Thanks for raising this issue. We understand that it is desirable to use the correct vector or matrix type for temporaries in order to preserve all information that can be used to speed up computations. Unfortunately, eval() is not the solution for this kind of problem. First, allow me to provide you with two possible solutions that work as required, then I will try to explain the semantics of eval()in Blaze.

    In order to solve your problem, you could use the following function:

    template< typename MT, bool SO >
    inline typename MT::ResultType evaluate( const blaze::Matrix<MT,SO>& mat )
    {
       const typename MT::ResultType tmp( mat );
       return tmp;
    }
    

    The function immediately evaluates the given matrix (expression) and returns the temporary. In case the result is used to initialize a matrix, the copy operation is potentially elided due to the return value optimization (RVO), but only if the target type is the same type as the return type. However, in case the result type and the target type differ or in case the result is assigned to another matrix, a copy operation (i.e. copy assignment) will be triggered:

    using namespace blaze;
    
    DynamicMatrix<int> A, B;
    
    auto C( evaluate( A * B ) );  // 'auto' evaluates to the correct result type, no copy operation due to RVO
    DynamicMatrix<int> C( evaluate( A * B ) );  // Result type is 'DynamicMatrix<int>', target is the same, no copy operation due to RVO
    CompressedMatrix<int> D( evaluate( A * B ) );  // Result type is 'DynamicMatrix<int>', target type differs, copy assignment required
    C = evaluate( A * B );  // No initialization, but assignment, therefore copy assignment is required
    

    The auto keyword simplifies things considerably, since it is not possible to (accidentally) use a different target type. Therefore in this context auto is highly recommended.

    The second possible solution is to query for the correct return type:

    using namespace blaze;
    
    DynamicMatrix<int> A, B;
    
    using ResultType = decltype( A * B )::ResultType;
    const ResultType C( A * B );
    

    Every vector and matrix type provides a nested ResultType, which represents the type of the result. This ResultType is guaranteed to preserve the characteristics of a result type. The following list gives an impression:

    DynamicMatrix * DynamicMatrix  -->  DynamicMatrix
    LowerMatrix * LowerMatrix  -->  LowerMatrix
    LowerMatrix * StrictlyLowerMatrix  -->  StrictlyLowerMatrix
    

    In the following I will try to explain the semantics of eval() in Blaze. The first important aspect is that the semantics of eval() in Blaze differs from the eval() semantics in other libaries. In most libraries, eval() means "evaluate immediately and create a temporary" (let's call this "compute semantics"). In Blaze eval() means "evaluate the expression as-is and don't optimize" (let's call this "preserve semantics"). Whereas in other libraries the result of eval() is a temporary vector or matrix, the result in Blaze is another expression object (which strictly speaking is also a temporary object of course). This expression object participates in the regular expression evaluation, but makes sure that the contained expression is preserved. For instance:

    using namespace blaze;
    
    DynamicMatrix<int> A, B;
    DynamicVector<int> x, y;
    
    y = A * B * x;          // Expression is restructured and evaluated as y = A * ( B * x );
    y = eval( A * B ) * x;  // Expression is not restructured and evaluated as y = ( A * B ) * x;
    

    Both semantics have their significance, but given the shortcomings of the evaluate() solution (inadvertent temporaries) and given the powerful restructuring capabilities of Blaze we believe that in our context the "preserve semantics" is more valuable.

    I should point out that the eval() with "compute semantics" is also not a perfect solution for your problem. Only in combination with auto the characteristics of the result type can be preserved. Therefore the second possible solution is apparently superior for your requirements. However, we realize that syntactically this is unfortunately not the most convenient solution. This is one reason why we will keep this issue open: we will use it to think about how to provide the automatic deduction of the result type without the possibility to inadvertently create temporaries. The second reason is that we should update the documentation of the eval() function to explain its semantics.

    I hope that this answers all questions and enables you to proceed with our benchmarks,

    Best regards,

    Klaus!

  2. Klaus Iglberger

    Summary

    The feature has been implemented, tested, optimized and documented as required. It is immediately available via cloning the Blaze repository and will be officially released in Blaze 3.1.

    eval() / evaluate()

    The evaluate() function forces an evaluation of the given vector or matrix expression and enables an automatic deduction of the correct result type of an operation. The following code example demonstrates its intended use for the multiplication of a lower and a strictly lower dense matrix:

    using blaze::DynamicMatrix;
    using blaze::LowerMatrix;
    using blaze::StrictlyLowerMatrix;
    
    LowerMatrix< DynamicMatrix<double> > A;
    StrictlyLowerMatrix< DynamicMatrix<double> > B;
    // ... Resizing and initialization
    
    auto C = evaluate( A * B );
    

    In this scenario, the evaluate() function assists in deducing the exact result type of the operation via the 'auto' keyword. Please note that if evaluate() is used in this way, no temporary vector or matrix is created and no copy operation is performed. Instead, the result is directly written to the target matrix due to copy elision. However, if evaluate() is used in combination with an explicit target type, a temporary will be created and a copy operation will be performed if the used type differs from the type returned from the function:

    StrictlyLowerMatrix< DynamicMatrix<double> > D( A * B );  // No temporary & no copy operation
    LowerMatrix< DynamicMatrix<double> > E( A * B );          // Temporary & copy operation
    DynamicMatrix<double> F( A * B );                         // Temporary & copy operation
    D = evaluate( A * B );                                    // Temporary & copy operation
    

    Sometimes it might be desirable to explicitly evaluate a sub-expression within a larger expression. However, please note that evaluate() is not intended to be used for this purpose. This task is more elegantly and efficiently handled by the eval() function:

    blaze::DynamicMatrix<double> A, B, C, D;
    
    D = A + evaluate( B * C );  // Unnecessary creation of a temporary matrix
    D = A + eval( B * C );      // No creation of a temporary matrix
    

    In contrast to the evaluate() function, eval() can take the complete expression into account and therefore can guarantee the most efficient way to evaluate it (see also Intra-Statement Optimization).

  3. Marcin Copik reporter

    Klaus,

    apologies for a late reply, I did not have a lot of time during Christmas. Many thanks for a detailed explanation, I absolutely agree with you here and I like proposed solutions, it makes perfect sense and I have not tried anything similar before due to lack of knowledge about ResultType - maybe it is described in docs and I have missed it. The guarantee of result type defined for each operation is necessary to move from ET node type, determined either by auto or decltype, to correct matrix type.

    The problem here is that what we really want is to implement a dynamically typed interface in a strongly static environment of C++ and that's why I'm okay with accepting solutions requiring a change in interface. I'm rather sceptical that we can do better with static polymorphism of templates. Using evaluate in assignment with auto keyword gives us the opportunity to implement kernels step by step without reducing possibilities for optimizations. I was thinking about different ways to implement it without drastic changes in syntax and I'm not sure that we can't find anything better than current approach. The only improvement could be providing a macro encapsulating decltype but that should not be necessary as long as the compiler correctly applies RVO (and if not then we still have move assignment and constructors).

    Thanks!

  4. Log in to comment