SpColbyCol and other column-by-column spgemm ops

Saliya discovered a bug in the column-by-column spgemm implementations in CombBLAS. I can verify that both the old SpColByCol in Friends.h as well as LocalSpGEMM in mtSpGEMM.h reproducibly seg faults. Valgrind wasn't helpful. The matrices are small and skewed. Local matices are 3x100 vs 100x2 for example. But nnz>0 so it isn't about missing a simple check like m=0 or nnz=0.

It is possible (but unlikely) that full "hash-based" versions are fine and this only affects the heap codes. However, I suspect that would be the case and haven't tested.

For the moment, I reverted the MultAnXBn_Synch in ParFriends.h to use the old outer-product implementation (change log here: https://bitbucket.org/berkeleylab/combinatorial-blas-2.0/commits/53156800699573c512589e813ff744b2b226eb73#chg-CombBLAS/include/CombBLAS/ParFriends.h) and this sidetracks the issue for now.

After we are done w/ deadlines, my plan is to dump the local matrices right before multiplication on the ranks the seg fault happens so that I can reproduce the bug w/out MPI (valgrind output gets really crowded w/ MPI).

What we know so far: 1- It isn't about multithreading (because both the serial SpColByCol and threaded ones crash) 2- It isn't about the use of "aux" array (because I forced the system to always use the "scanning based" if/else branch of the dcsc->FillColInds() function and it still seg faults)

For the moment, just know that MultAnXBn_Synch is using the old implementation.

The code that initiates the bug is inside Applications/SegTestApp

Comments (2)