Multiplication assignment of sparse vectors for views non-optimal

Issue #130 resolved
Fabien Péan created an issue

After some debug tests, it appears that the multiplication assignment of a sparse vector to a dense type in Band class generates temporaries that could be avoided. It seems to be the case for other views as well (row, column)

See for example the following functions in views/band/Dense.h

template< typename VT > inline BandImpl& operator*=( const SparseVector<VT,TF>& rhs );
template< typename VT > inline void multAssign( const SparseVector<VT,TF>& rhs );

Comments (5)

  1. Klaus Iglberger

    Hi Fabien!

    Thanks for pointing out this performance problem. Also thanks a lot for providing a pull request with a solution. It successfully passed all our tests.

    When considering performance, the problem is unfortunately more complex than one would intuitively anticipate. The performance is depending on several factors:

    • The filling degree of the sparse vector
    • The size of the matrix
    • The kinds of memory used by the matrix

    Below you will find an overview of several benchmarks we ran to compare the two approaches. The benchmarks were run on a Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz, using the diagonal of the accordingly sized integer matrices. No vectorization can be exploited, since the elements of a band are not contiguous.

    Matrix Type Matrix size Filling degree With temporary [s] Without temporary [s]
    StaticMatrix 3x3 66% 1.0000 1.6693
    StaticMatrix 10x10 20% 1.0000 1.7605
    DynamicMatrix 10x10 20% 7.3662 1.0000
    StaticMatrix 100x100 2% 1.0000 1.0991
    StaticMatrix 100x100 10% 1.0000 1.5433
    DynamicMatrix 100x100 5% 3.8995 1.0000
    DynamicMatrix 100x100 20% 2.4706 1.0000
    DynamicMatrix 100x100 50% 1.4024 1.0000
    DynamicMatrix 1000x1000 1% 19.3238 1.0000
    DynamicMatrix 1000x1000 5% 9.5317 1.0000
    DynamicMatrix 1000x1000 50% 1.5265 1.0000

    In case of small matrices, the performance apparently strongly depends on the type of matrix used. Small static matrices benefit from the temporary as there is no memory allocation and there are no nested, short loop. Small dynamic matrices on the other hand do not benefit from a temporary, mainly due to the required memory allocation. For large matrices the approach without temporary is always superior, even for a large filling degree, but with growing filling degree the performance benefit shrinks.

    In conclusion, your proposed solution performs better for almost all cases, except for small matrices with static storage. Therefore we will definitly use your approach, but will try to adapt it to also provide an optimal performance for small static matrices. Also, we will perform similar benchmarks for rows and columns and adapt these implementations accordingly.Thanks a lot,

    Best regards,

    Klaus!

  2. Klaus Iglberger

    Hi Fabien!

    Thanks again for pointing out this problem and providing a pull request. As it turns out the performance of the two approaches is system depending, which makes it hard to find a reasonable condition for when to use a temporary. Therefore we have chosen your solution without temporary vector and will revisit this issue at a later date. Still, although the pull request is flawless, we will decline it as we would like to collapse the two multiplication assignment operators for dense and sparse vectors into a single one instead of just adapting the one for sparse vectors.

    Best regards,

    Klaus!

  3. Klaus Iglberger

    Commit f5699e6 resolves the performance problem of the multiplication assignment with bands, as well as rows, columns and subvectors. The fix is immediately available via cloning the Blaze repository and will be officially released in Blaze 3.3.

  4. Log in to comment