SpDM^3 and HP-CONCORD version 0.1

SpDM^3: Sparse-Dense Matrix-Matrix Multiplication at scale
HP-CONCORD: High-performance inverse covariance matrix estimation using the CONCORD-ISTA algorithm.


  • C++, MPI, and OpenMP
  • (Preferred) BLAS/MKL/ESSL library
  • (Provided in the repo) graph500-1.2 (R-MAT graph generator) and SpBLAS.


If you just want to use HP-CONCORD on Edison at NERSC, skip to the section 'Using the HP-CONCORD module on Edison at NERSC'. Otherwise, you need to compile SpDM^3/HP-CONCORD from the source files. Below is how to set up the environment.

  • git clone <bitbucket link>
  • cd into the folder
  • cd configs and choose configuration file
    • Edison with MKL: cp
    • Ubuntu with BLAS: cp
    • Ubuntu with MKL: cp (If Intel MKL is not installed in /opt/intel, please update the INTELROOT variable in the file.)
    • Mac with BLAS: cp
    • Mira with ESSL: cp

The set up is complete. See the next two sections for separate instructions on how to use SpDM^3 and HP-CONCORD.

How to use SpDM^3

If you just wanted to use SpDM^3 for distributed sparse-dense matrix-matrix multiplication, see code examples in the examples folder. Please also cite the SpDM^3 paper (see details in the Citation section).

How to run HP-CONCORD


From the top directory, cd bin and make to get the executables ista1, ista2, and ista5.

Using the HP-CONCORD module on NERSC machines

If you are on Edison or Cori at NERSC, you can use the custom HP-CONCORD module by running the commands

  • module use /global/common/software/m88/<machine_name>/modulefiles
  • module load hp-concord/0.1

where machine_name can be edison or cori. The module provides you ista1, ista2, and ista5.

Input format

Matrix X of n observations (rows) of p features (columns), stored in NumPy column-major format. Our code assumes n << p.

Which executable to run

Please refer to our PMLR'18 paper for algorithm names and notations.

  • ista1 implements the Cov variant, taking in the observation matrix X as input.
  • ista2 implements the Obs variant, taking in the obseravtion matrix X as input.
  • ista5 implements the Cov variant, taking in the sample covariance matrix S as input.

If the expected number of nonzeroes per row is less than n, use ista1. Otherwise, use ista2. In general, ista2 outperforms ista1 so if you're not sure what to use, use ista2.

Command-line parameters

  • -i: NumPy's .npy input file name.
    • Assumes column-major format where rows are observations and columns are features.
  • -o: output file name. (Matrix Market format.)
  • -re: initial guess for Omega (Matrix Market format.) (Can be used for restarting.)
  • -c: replication factor (default is 1).
    • If you don't know what this is, it's safe to just ignore this option.
    • c <= sqrt(number of MPI processes)
  • -l1: lambda1 (default is 0.3).
  • -l2: lambda2 (default is 0.2).
  • -L1: Lambda1 .npy input file name.
    • Takes L1 as an input matrix. Overrides -l1.
    • Row-major. (fortran_order = false)
  • -tau: starting tau (default is 1.0).
  • -eps: epsilon (default is 1.0e-5).
  • -max_inner: maximum inner iterations (default is 20).
  • -max_outer: maximum outer iterations (default is 100).
  • -stop_inner: the inner iteration to stop when reaching the last outer iteration (default is -1 = unactivated). Useful for debugging.
  • -outer_offset: start the outer iteration count from this number (Makes it less confusing when doing checkpoint-restarting)


The following commands run spdm3-concord on 16 cores (16 MPI processes, 1 thread per process) with input named input.npy and output named out.csr, with replication factor c = 4.


# To use one MKL thread per MPI process.

srun -n 16 -c 1 ./ista2 -i input.npy -o out -c 4


mpirun -n 16 ./ista2 -i input.npy -o out -c 4

Collecting the output Matrix Market file

For better I/O efficiency, each process saves their own part of Omega separately. For an output file name "out", process 0 will save "out-00000", process 1 will save "out-00001", and so on. To get the final Matrix Market file, run

cat <output_file_name>-* > <matrix_market_file_name>.mkt

For example, if "-o out" is used, the user should call,

cat out-* > out.mkt; rm out-*


The long runs can be split to multiple jobs by combining output writing option (-o) with initial guess option (-re). The following command saves the output to out-* after 10 outer iterations.

<launch command> ./ista5 -i input.npy -o out -max_outer 10
cat out-* > out.mkt; rm out-*

The following command restarts the process with the output as initial guess (= resuming starting from the 11th iteration). The option -outer_offset tells the program to start the outer iteration counter from 10, effectively giving the same round numbers as the non-restarting version.

<launch command> ./ista5 -i input.npy -re out.mkt -outer_offset 10


If you use any part of this code, please cite the following paper:

  • Penporn Koanantakool, Ariful Azad, Aydın Buluç, Dmitriy Morozov, Sang-Yun Oh, Leonid Oliker, and Katherine Yelick. Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 842–853, May 2016.

If you use HP-CONCORD, please also cite the HP-CONCORD and the original CONCORD-ISTA papers:

  • Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydın Buluç, Dmitriy Morozov, Leonid Oliker, Katherine Yelick, and Sang-Yun Oh. Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:1376-1386, 2018.
  • Sang Oh, Onkar Dalal, Kshitij Khare, and Bala Rajaratnam. Optimization methods for sparse pseudo-likelihood graphical model selection. In NIPS 27, pages 667–675. 2014.


penpornk at eecs dot berkeley dot edu

"HP-CONCORD" Copyright (c) 2017, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.

If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Innovation & Partnerships Office at

NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit other to do so.