HTTPS SSH

jpbMatrices

Note

Please do not use this for production code. Matrix-matrix multiplication is a part of every BLAS implementation. Please look into ATLAS, Math.Net, Accelerate.h, cuBLAS, LAPACK, Intel's MKL, or another open source or commercial BLAS library.

Summary

This system explores how different data structures and code optimizations effect matrix-matrix multiplication. Simple and complex multiplication methods are included. Comparing the various methods also tests compiler optimizations.

Performance Notes

MatrixFM's MultiplyBlockTransposeIndexerAccumulator method reaches at least 99% of maximum theoretical, non-SIMD GFLOPS in both single- and multi-threaded implementations. Performance was measured for double-precision (64-bit) floating point elements with Turbo Boost and hyper-threading off.

Matrix Classes

The following classes are available:

  • Matrix1D: Uses a 1-dimensional array of doubles to store elements.
  • Matrix2D: Uses a 2-dimensional array of doubles to store elements.
  • MatrixAA: Uses an array of arrays of doubles to store elements.
  • MatrixMN: This is a wrapper around Math.Net's matrix.
  • MatrixFM: Uses an array of arrays of one-dimensional arrays of doubles to store elements. This class was built specifically for cache optimization. Additionally, the multiplication code has been heavily optimized. Please see jpbMatrices' README.TXT for what the multiplication method names mean.

Multiplication Methods

Each of these methods is implemented with and without accumulators, with and without indexers, and single- and multi-threaded for Matrix1d, Matrix2D, and MatrixAA.

  • Basic: the standard 3-loop multiply.
  • Transpose: the second matrix is transposed into column-major order and a modified basic multiply is performed.
  • Block: the standard 6-loop block multiply. This algorithm was taken from http://www.netlib.org/utk/papers/autoblock/node2.html. My thanks to Jack Dongarra for making his paper available online.
  • Block Transpose: the second matrix is transposed into column-major order and a modifed block multiply is performed.
  • Math.Net's Native C#: Available for MatrixMN. Used to validate other methods.
  • Math.Net's Intel MKL: Available for MatrixMN. Demonstrates what is possible with SIMD and optimization.

Versions

C++

Written in C++11. Use the included makefile and gmake to make. Tested under g++ 2.7. Includes template versions of some multiplication methods to show the performance impact.

CSharp

Written in Visual Studio 2012. Loads, compiles, runs using Xamarin Studio and Mono. Includes generic versions of some multiplication methods to show the performance impact.

Visual C++

Written in Visual Studio 2012. Includes template versions of some multiplication methods to show the performance impact.