HTTPS SSH

SpDM^3 and HP-CONCORD version 0.1

SpDM^3: Sparse-Dense Matrix-Matrix Multiplication at scale
HP-CONCORD: High-performance inverse covariance matrix estimation using the CONCORD-ISTA algorithm.

Dependencies

  • C++, MPI, and OpenMP
  • (Preferred) BLAS/MKL/ESSL library
  • (Provided in the repo) graph500-1.2 (R-MAT graph generator) and SpBLAS.

Installation

  • git clone <bitbucket link>
  • cd into the folder
  • cd configs and choose configuration file
    • Edison: cp config.mk.edison config.mk (MKL)
    • Ubuntu: cp config.mk.ubuntu config.mk (BLAS)
    • Mac: cp config.mk.mac config.mk (BLAS)
    • Mira: cp config.mk.mira config.mk (ESSL)

How to use SpDM^3

If you just wanted to use SpDM^3 for distributed sparse-dense matrix-matrix multiplication, see code examples in the examples folder. Please also cite the SpDM^3 paper (see details in the Citation section).

How to run HP-CONCORD

Building HP-CONCORD

From the top directory, cd bin and make to get the executables ista1 and ista2.

Input format

Matrix X of n observations (rows) of p features (columns), stored in NumPy column-major format. Our code assumes n << p.

Which executable to run

If the expected number of nonzeroes per row is less than n, use ista1. Otherwise, use ista2. In general, ista2 outperforms ista1 so if you're not sure what to use, use ista2.

Command-line parameters

  • -i: NumPy's .npy input file name.
    • Assumes column-major format where rows are observations and columns are features.
  • -o: output file name. (Matrix Market format.)
  • -re: initial guess for Omega (Matrix Market format.) (Can be used for restarting.)
  • -c: replication factor (default is 1).
    • If you don't know what this is, it's safe to just ignore this option.
    • c <= sqrt(number of MPI processes)
  • -l1: lambda1 (default is 0.3).
  • -l2: lambda2 (default is 0.2).
  • -L1: Lambda1 .npy input file name.
    • Takes L1 as an input matrix. Overrides -l1.
    • Row-major. (fortran_order = false)
  • -tau: starting tau (default is 1.0).
  • -eps: epsilon (default is 1.0e-5).
  • -max_inner: maximum inner iterations (default is 20).
  • -max_outer: maximum outer iterations (default is 100).
  • -stop_inner: the inner iteration to stop when reaching the last outer iteration (default is -1 = unactivated). Useful for debugging.
  • -outer_offset: start the outer iteration count from this number (Makes it less confusing when doing checkpoint-restarting)

Examples

The following commands run spdm3-concord on 16 cores (16 MPI processes, 1 thread per process) with input named input.npy and output named out.csr, with replication factor c = 4.

Edison

# To use one MKL thread per MPI process.
export OMP_NUM_THREADS=1

srun -n 16 -c 1 ./ista2 -i input.npy -o out -c 4

Ubuntu/Mac

mpirun -n 16 ./ista2 -i input.npy -o out -c 4

Collecting the output Matrix Market file

For better I/O efficiency, each process saves their own part of Omega separately. For an output file name "out", process 0 will save "out-00000", process 1 will save "out-00001", and so on. To get the final Matrix Market file, run

cat <output_file_name>-* > <matrix_market_file_name>.mkt

For example, if "-o out" is used, the user should call,

cat out-* > out.mkt; rm out-*

Checkpoint-Restarting

The long runs can be split to multiple jobs by combining output writing option (-o) with initial guess option (-re). The following command saves the output to out-* after 10 outer iterations.

<launch command> ./ista5 -i input.npy -o out -max_outer 10
cat out-* > out.mkt; rm out-*

The following command restarts the process with the output as initial guess (= resuming starting from the 11th iteration). The option -outer_offset tells the program to start the outer iteration counter from 10, effectively giving the same round numbers as the non-restarting version.

<launch command> ./ista5 -i input.npy -re out.mkt -outer_offset 10

Citation

If you use any part of this code, please cite the following paper:

Penporn Koanantakool, Ariful Azad, Aydın Buluç, Dmitriy Morozov, Sang-Yun Oh, Leonid Oliker, and Katherine Yelick. Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 842–853, May 2016.

If you use HP-CONCORD, please also cite the HP-CONCORD and the original CONCORD-ISTA paper:

Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydın Buluç, Dmitriy Morozov, Sang-Yun Oh, Leonid Oliker, and Katherine Yelick. Communication-Avoiding Optimization Methods for Massive- Scale Graphical Model Structure Learning. In: ArXiv e-prints (Oct. 2017). arXiv: 1710.10769 [stat.ML].

Sang Oh, Onkar Dalal, Kshitij Khare, and Bala Rajaratnam. Optimization methods for sparse pseudo-likelihood graphical model selection. In NIPS 27, pages 667–675. 2014.

Contact

penpornk at eecs dot berkeley dot edu

"HP-CONCORD" Copyright (c) 2017, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.

If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Innovation & Partnerships Office at IPO@lbl.gov.

NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit other to do so.