Slow capacity reservation for CompresedMatrix
I have pretty big matrix:
rows = columns ~ 192 * 2160
But it’s only 4 non zero element in every column, so it very sparse.
I know required size and position of nonzero elements on construction step.
So I created this matrix once and I need only change nonzero elements values sometimes.
When I create matrix with constructor and then filled it with values directly:
m(i, j) = value;
Speed is very slow.
ok. I call
m.reserve(N); // N - non zero element number
but this did not change anything.
This one
m.reserve(j, N); // for all columns
gave effect and subsequent filling took fraction of second but this kind of reservation took 220s.
When I use code from tutorial for one time append with reserve(), append() and finalize() total time is fraction of second.
But I need to change values in matrix many times. Not one.
So is it possible to set suitable capacity one time and then change values when I need it?
Could you please explain how CompressedMatrix, reserve(), append() and finalize() works?
May be add more details in documentation?
Thanks
Comments (7)
-
-
Hi Victor!
I’ve found the time to do a complete series of benchmarks for the different approaches to fill a
CompressedMatrix
. The following should give you an overview which approach is the most favorable in terms of runtime. In all cases I benchmark with Clang-9.0, use the compilation flags-O2 -DNDEBUG
and try to setup the following matrix structure, whereN=200000
and where all values on the diagonal and the two sub-diagonals are filled:( 1 1 0 0 0 ... 0 0 0 ) ( 1 1 1 0 0 ... 0 0 0 ) ( 0 1 1 1 0 ... 0 0 0 ) ( 0 0 1 1 1 ... 0 0 0 ) ( 0 0 0 1 1 ... 0 0 0 ) ( ................... ) ( 0 0 0 0 0 ... 1 1 0 ) ( 0 0 0 0 0 ... 1 1 1 ) ( 0 0 0 0 0 ... 0 1 1 )
Also, I try to explain the internal mechanics a little better, which may serve for an improvement of the tutorial and wiki.
Approach 1: reserve/append/finalize
As the wiki explains, the most efficient way to fill a sparse matrix is a combination of
reserve()
,append()
andfinalize()
:// Setup of the compressed row-major matrix // // ( 0 1 0 2 0 ) // A = ( 0 0 0 0 0 ) // ( 3 0 0 0 0 ) // blaze::CompressedMatrix<int> M1( 3UL, 5UL ); M1.reserve( 3 ); // Reserving enough space for 3 non-zero elements M1.append( 0, 1, 1 ); // Appending the value 1 in row 0 with column index 1 M1.append( 0, 3, 2 ); // Appending the value 2 in row 0 with column index 3 M1.finalize( 0 ); // Finalizing row 0 M1.finalize( 1 ); // Finalizing the empty row 1 to prepare row 2 M1.append( 2, 0, 3 ); // Appending the value 3 in row 2 with column index 0 M1.finalize( 2 ); // Finalizing row 2
Via the
reserve()
function you allocate enough memory for all non-zero elements of the entire matrix.append()
andfinalize()
are now used to insert the elements and to mark the end of each single row or column. This is a very low-level approach, very similar to writing to an array manually. Theappend()
function writes the new element to the next memory location in line, and thefinalize()
function sets the internal pointers accordingly. It is very important to note that thefinalize()
function has to be explicitly called for each row or column, even for empty ones! Else the internal data structure will be corrupt! Also note that althoughappend()
does not allocate new memory, it still invalidates all iterators returned by theend()
functions!The following demonstrates the code to setup the matrix:
CompressedMatrix<int,rowMajor> A( N, N ); A.reserve( N*3UL ); for( size_t i=0; i<N; ++i ) { const size_t jbegin( i == 0UL ? 0UL : i-1UL ); const size_t jend ( i == N-1UL ? N-1UL : i+1UL ); for( size_t j=jbegin; j<=jend; ++j ) { A.append( i, j, 99 ); } A.finalize( i ); }
On my machine, the runtime is 0.026 seconds.
Approach 2: Reservation via the constructor
In case you know the number of non-zero elements upfront, you can also perform the reservation via the constructor of
CompressedMatrix
. For that purposeCompressedMatrix
provides a constructor taking astd::vector<size_t>
:std::vector<size_t> nonzeros( N, 3UL ); // Create a vector of N elements with value 3 nonzeros[ 0] = 2UL; // We need only 2 elements in the first ... nonzeros[N-1] = 2UL; // ... and last row/column. CompressedMatrix<int,rowMajor> A( N, N, nonzeros ); //std::cerr << " Inserting values...\n"; for( size_t i=0; i<N; ++i ) { const size_t jbegin( i == 0UL ? 0UL : i-1UL ); const size_t jend ( i == N-1UL ? N-1UL : i+1UL ); for( size_t j=jbegin; j<=jend; ++j ) { A.insert( i, j, 99 ); } }
On my machine, the runtime for this approach is 0.027 seconds.
Approach 3: Rowwise/Columnwise reserve and insert
The third approach performs a rowwise/columnwise reservation of capacity:
CompressedMatrix<int,rowMajor> A( N, N ); A.reserve( N*3UL ); // Allocate the total amount of memory A.reserve( 0UL, 2UL ); // Reserve a capacity of 2 for row 0 for( size_t i=1; i<N-1UL; ++i ) { A.reserve( i, 3UL ); // Reserve a capacity of 3 for row i } A.reserve( N-1UL, 2UL ); // Reserve a capacity of 2 for the last row for( size_t i=0; i<N; ++i ) { const size_t jbegin( i == 0UL ? 0UL : i-1UL ); const size_t jend ( i == N-1UL ? N-1UL : i+1UL ); for( size_t j=jbegin; j<=jend; ++j ) { A.insert( i, j, 99 ); } }
The first call to
reserve()
performs the allocation for the entire matrix. The complete matrix now has the according capacity, but each single row/column has a capacity of 0. Therefore the subsequent calls toreserve()
split the existing capacity for all rows/columns.Unfortunately this approach is rather slow. On my machine the runtime is approx. 30 seconds. The downside of this approach is that because all elements are stored consecutively in a single array, changing the capacity of a single row/column causes a change in all following rows/columns. This behavior is similar to inserting an element at the front of a
std::vector
; all subsequent elements have to be shifted.Approach 4: Use of reserve and the function call operator
In this approach the function call operator (i.e.
operator()
) is used after the initial call toreserve()
:CompressedMatrix<int,rowMajor> A( N, N ); A.reserve( N*3UL ); for( size_t i=0; i<N; ++i ) { const size_t jbegin( i == 0UL ? 0UL : i-1UL ); const size_t jend ( i == N-1UL ? N-1UL : i+1UL ); for( size_t j=jbegin; j<=jend; ++j ) { A(i,j) = 99; } }
This approach is the most general and convenient, but also the slowest of all (on my machine approx. 64 seconds). With every call to
operator()
, a new element is inserted and all subsequent elements and rows/columns have to be adapted. Thus is obviously pays off to perform an upfront reservation for each row/column.Summary
In summary I would recommend approach 1 or 2 for your problem, depending on whether you know the number of non-zero elements upfront. In the previous post I showed some code that you can use to modify the elements without changing the sparsity structure of the matrix. Of course it is also possible to manually traverse all elements:
blaze::CompressedMatrix<int,rowMajor> A; // ... Resizing and initialization for( size_t i=0UL; i<N; ++i ) { for( auto it=A.begin(i); it!=A.end(i); ++it ) { it->value() = ...; // Writing the non-zero value } }
I hope this helps. We will try to update the tutorial and wiki accordingly and provide an FAQ entry. Thanks again for creating the issue,
Best regards,
Klaus!
-
-
assigned issue to
-
assigned issue to
-
- changed status to open
-
- changed status to resolved
The FAQ has been extended in both the wiki and the tutorial with a performance analysis of the performance of the different approaches to fill a sparse matrix. The updated wiki is already online, the updated tutorial will be officially released in Blaze 3.8.
-
reporter Thanks for your great answer and work!
So as I understood anyway there will be two different code for first time elements assignment (by append/insert) and others (by matrix operator())? If I just only update nonzero values of course.
-
Hi Victor!
You should always consider access via
operator()
as more expensive than other approaches. During the setup of the matrix it performs a search, may trigger an allocation and potentially shifts elements. Append doesn't search (it always adds to theend()
of a row), never allocates, and never shifts. However,operator()
will never fail, whereas the responsibility for doing the right thing withappend()
is on you. Thus I would consideroperator()
a high-level interface, andappend()
a low-level interface.Best regards,
Klaus!
- Log in to comment
Hi Victor!
Thanks for creating this issue. You are correct, the best approach for the setup of a very large sparse matrix is to use the combination of
reserve()
,append()
andfinalize()
. Once the non-zero elements have been inserted, you can use themap()
function to visit only the non-zero elements. For instance, the following function resets all non-zero elements to 0 without changing the sparsity pattern or removing any elements (see also the wiki):If you can (re-)set the values in this way, this would be the fastest approach. Else it would help me to get a better understanding of how you have to set your values. Then I can try to give further advice. I hope this already helps,
Best regards,
Klaus!