Slow capacity reservation for CompresedMatrix

Issue #343 resolved
Kataev Victor created an issue

I have pretty big matrix:

rows = columns ~ 192 * 2160

But it’s only 4 non zero element in every column, so it very sparse.

I know required size and position of nonzero elements on construction step.

So I created this matrix once and I need only change nonzero elements values sometimes.

When I create matrix with constructor and then filled it with values directly:

m(i, j) = value;

Speed is very slow.

ok. I call

m.reserve(N); // N - non zero element number

but this did not change anything.

This one

m.reserve(j, N); // for all columns

gave effect and subsequent filling took fraction of second but this kind of reservation took 220s.

When I use code from tutorial for one time append with reserve(), append() and finalize() total time is fraction of second.

But I need to change values in matrix many times. Not one.

So is it possible to set suitable capacity one time and then change values when I need it?

Could you please explain how CompressedMatrix, reserve(), append() and finalize() works?

May be add more details in documentation?

Thanks

Comments (7)

  1. Klaus Iglberger

    Hi Victor!

    Thanks for creating this issue. You are correct, the best approach for the setup of a very large sparse matrix is to use the combination of reserve(), append() and finalize(). Once the non-zero elements have been inserted, you can use the map() function to visit only the non-zero elements. For instance, the following function resets all non-zero elements to 0 without changing the sparsity pattern or removing any elements (see also the wiki):

    template< typename MT  // Type of the sparse matrix
            , bool SO >    // Storage order
    void setToZero( blaze::SparseMatrix<MT,SO>& mat )
    {
       (~mat) = blaze::map( ~mat, []( const auto& value ){ return decltype(value){}; } );
    }
    

    If you can (re-)set the values in this way, this would be the fastest approach. Else it would help me to get a better understanding of how you have to set your values. Then I can try to give further advice. I hope this already helps,

    Best regards,

    Klaus!

  2. Klaus Iglberger

    Hi Victor!

    I’ve found the time to do a complete series of benchmarks for the different approaches to fill a CompressedMatrix. The following should give you an overview which approach is the most favorable in terms of runtime. In all cases I benchmark with Clang-9.0, use the compilation flags -O2 -DNDEBUG and try to setup the following matrix structure, where N=200000 and where all values on the diagonal and the two sub-diagonals are filled:

    ( 1 1 0 0 0 ... 0 0 0 )
    ( 1 1 1 0 0 ... 0 0 0 )
    ( 0 1 1 1 0 ... 0 0 0 )
    ( 0 0 1 1 1 ... 0 0 0 )
    ( 0 0 0 1 1 ... 0 0 0 )
    ( ................... )
    ( 0 0 0 0 0 ... 1 1 0 )
    ( 0 0 0 0 0 ... 1 1 1 )
    ( 0 0 0 0 0 ... 0 1 1 )
    

    Also, I try to explain the internal mechanics a little better, which may serve for an improvement of the tutorial and wiki.

    Approach 1: reserve/append/finalize

    As the wiki explains, the most efficient way to fill a sparse matrix is a combination of reserve(), append() and finalize():

    // Setup of the compressed row-major matrix
    //
    //       ( 0 1 0 2 0 )
    //   A = ( 0 0 0 0 0 )
    //       ( 3 0 0 0 0 )
    //
    blaze::CompressedMatrix<int> M1( 3UL, 5UL );
    M1.reserve( 3 );       // Reserving enough space for 3 non-zero elements
    M1.append( 0, 1, 1 );  // Appending the value 1 in row 0 with column index 1
    M1.append( 0, 3, 2 );  // Appending the value 2 in row 0 with column index 3
    M1.finalize( 0 );      // Finalizing row 0
    M1.finalize( 1 );      // Finalizing the empty row 1 to prepare row 2
    M1.append( 2, 0, 3 );  // Appending the value 3 in row 2 with column index 0
    M1.finalize( 2 );      // Finalizing row 2
    

    Via the reserve() function you allocate enough memory for all non-zero elements of the entire matrix. append() and finalize() are now used to insert the elements and to mark the end of each single row or column. This is a very low-level approach, very similar to writing to an array manually. The append() function writes the new element to the next memory location in line, and the finalize() function sets the internal pointers accordingly. It is very important to note that the finalize() function has to be explicitly called for each row or column, even for empty ones! Else the internal data structure will be corrupt! Also note that although append() does not allocate new memory, it still invalidates all iterators returned by the end() functions!

    The following demonstrates the code to setup the matrix:

    CompressedMatrix<int,rowMajor> A( N, N );
    
    A.reserve( N*3UL );
    for( size_t i=0; i<N; ++i ) {
       const size_t jbegin( i == 0UL ? 0UL : i-1UL );
       const size_t jend  ( i == N-1UL ? N-1UL : i+1UL );
       for( size_t j=jbegin; j<=jend; ++j ) {
          A.append( i, j, 99 );
       }
       A.finalize( i );
    }
    

    On my machine, the runtime is 0.026 seconds.

    Approach 2: Reservation via the constructor

    In case you know the number of non-zero elements upfront, you can also perform the reservation via the constructor of CompressedMatrix. For that purpose CompressedMatrix provides a constructor taking a std::vector<size_t>:

    std::vector<size_t> nonzeros( N, 3UL );  // Create a vector of N elements with value 3
    nonzeros[  0] = 2UL;                     // We need only 2 elements in the first ...
    nonzeros[N-1] = 2UL;                     //  ... and last row/column.
    
    CompressedMatrix<int,rowMajor> A( N, N, nonzeros );
    
    //std::cerr << " Inserting values...\n";
    for( size_t i=0; i<N; ++i ) {
       const size_t jbegin( i == 0UL ? 0UL : i-1UL );
       const size_t jend  ( i == N-1UL ? N-1UL : i+1UL );
       for( size_t j=jbegin; j<=jend; ++j ) {
          A.insert( i, j, 99 );
       }
    }
    

    On my machine, the runtime for this approach is 0.027 seconds.

    Approach 3: Rowwise/Columnwise reserve and insert

    The third approach performs a rowwise/columnwise reservation of capacity:

    CompressedMatrix<int,rowMajor> A( N, N );
    
    A.reserve( N*3UL );                // Allocate the total amount of memory
    A.reserve( 0UL, 2UL );             // Reserve a capacity of 2 for row 0
    for( size_t i=1; i<N-1UL; ++i ) {
       A.reserve( i, 3UL );            // Reserve a capacity of 3 for row i
    }
    A.reserve( N-1UL, 2UL );           // Reserve a capacity of 2 for the last row
    
    for( size_t i=0; i<N; ++i ) {
       const size_t jbegin( i == 0UL ? 0UL : i-1UL );
       const size_t jend  ( i == N-1UL ? N-1UL : i+1UL );
       for( size_t j=jbegin; j<=jend; ++j ) {
          A.insert( i, j, 99 );
       }
    }
    

    The first call to reserve() performs the allocation for the entire matrix. The complete matrix now has the according capacity, but each single row/column has a capacity of 0. Therefore the subsequent calls to reserve() split the existing capacity for all rows/columns.

    Unfortunately this approach is rather slow. On my machine the runtime is approx. 30 seconds. The downside of this approach is that because all elements are stored consecutively in a single array, changing the capacity of a single row/column causes a change in all following rows/columns. This behavior is similar to inserting an element at the front of a std::vector; all subsequent elements have to be shifted.

    Approach 4: Use of reserve and the function call operator

    In this approach the function call operator (i.e. operator()) is used after the initial call to reserve():

    CompressedMatrix<int,rowMajor> A( N, N );
    
    A.reserve( N*3UL );
    
    for( size_t i=0; i<N; ++i ) {
       const size_t jbegin( i == 0UL ? 0UL : i-1UL );
       const size_t jend  ( i == N-1UL ? N-1UL : i+1UL );
       for( size_t j=jbegin; j<=jend; ++j ) {
          A(i,j) = 99;
       }
    }
    

    This approach is the most general and convenient, but also the slowest of all (on my machine approx. 64 seconds). With every call to operator(), a new element is inserted and all subsequent elements and rows/columns have to be adapted. Thus is obviously pays off to perform an upfront reservation for each row/column.

    Summary

    In summary I would recommend approach 1 or 2 for your problem, depending on whether you know the number of non-zero elements upfront. In the previous post I showed some code that you can use to modify the elements without changing the sparsity structure of the matrix. Of course it is also possible to manually traverse all elements:

    blaze::CompressedMatrix<int,rowMajor> A;
    // ... Resizing and initialization
    
    for( size_t i=0UL; i<N; ++i ) {
       for( auto it=A.begin(i); it!=A.end(i); ++it ) {
          it->value() = ...;  // Writing the non-zero value
       }
    }
    

    I hope this helps. We will try to update the tutorial and wiki accordingly and provide an FAQ entry. Thanks again for creating the issue,

    Best regards,

    Klaus!

  3. Klaus Iglberger

    The FAQ has been extended in both the wiki and the tutorial with a performance analysis of the performance of the different approaches to fill a sparse matrix. The updated wiki is already online, the updated tutorial will be officially released in Blaze 3.8.

  4. Kataev Victor reporter

    Thanks for your great answer and work!

    So as I understood anyway there will be two different code for first time elements assignment (by append/insert) and others (by matrix operator())? If I just only update nonzero values of course.

  5. Klaus Iglberger

    Hi Victor!

    You should always consider access via operator() as more expensive than other approaches. During the setup of the matrix it performs a search, may trigger an allocation and potentially shifts elements. Append doesn't search (it always adds to the end() of a row), never allocates, and never shifts. However, operator() will never fail, whereas the responsibility for doing the right thing with append() is on you. Thus I would consider operator() a high-level interface, and append() a low-level interface.

    Best regards,

    Klaus!

  6. Log in to comment