CSE 6230, Fall 2013: Lab 10, Th Nov 7: Sparse Operations

Due date: Thursday, November 21, 4:30 PM
This page: http://j.mp/gtcse6230fa13lab10
Lecture notes: SIMD Part 1, SIMD Part 2, Memory Optimizations
Info on the Jinx cluster: http://support.cc.gatech.edu/facilities/instructional-labs/jinx-cluster

In this lab you will practice optimizing operations with indirect memory accesses. Your task will be to optimize iterations of PageRank algorithm. PageRank algorithm simulates a user who randomly clicks hyperlinks on web pages. All hyperlink on a web page are assumed to have the same probability of clicking on them. Initially PageRank assigns all web pages the same probability of visiting it. On each iteration this probability is updated by adding up the probabilities of all clicks through which a user can land on this page.

Mathematically the vector of page probabilities in the end of iteration is the product of the matrix of transition probabilities by the vector of page probabilities in the beginning of the iteration. The element (i, j) of transition matrix contains the probabilities to click on a hyperlink to web page i while browsing the web page j. Thus, the vector of probabilities is described by a Markov model of the 1st order: the vector of probabilities on the next iteration depends only on the vector of probabilities of previous iteration. Asymptotically (when the number of iteration is large) the vector of probabilities converges to the first eigenvector of the transition matrix (if this looks similar to computation of eigenvalues in Lab 9, it is). This eigenvector is the stationary distribution of the Markov chain model.

Most elements in the transition matrix are zeroes, so it is usually stored in sparce format for better efficiency. The code in Lab 9 uses Compressed Row Storage (CSR) format. In this format the matrix is represented by three arrays:

An array of non-zero elements (zero elements are not stored anywhere).
An array which specifies the column index for each non-zero element. It has the same length as the array of non-zero elements.
An array which specified the start of each row. It contains indices (into the array of non-zero elements) of the first elements in each row. It has the same length as the number of rows + 1 (the last element contains the number of non-zero elements, i.e. the end of the last row).

Pages which do not have outgoing links are treated in a special way: they are supposed to link every page on the internet. To enable this feature without actually adding these imputed hyperlinks to the transition matrix PageRank iteration function takes an array of indices of such web pages, and computes the probabilities as if they linked every page in the dataset.

You may if you wish work in teams of two. To simplify our grading of your assignments, each person should submit his/her own assignment; however, all team members may submit identical code. Be sure to indicate with whom you worked by creating a README file as part of your submission.

Part 0: Getting started

Execute the following command to setup your environment and get the recent gcc (4.8.1), clang (3.3), and valgrind (if you plan to use Intel compiler, do not type this command in the same terminal session):

source /nethome/mdukhan3/install/envvars.sh

Fork the starting code for this lab and clone to get a local copy of repository.

The starting code implements naive versions of the algorithms to optimize, unit, and performance tests. We also provide pagerank.pbs file to submit your program to the Jinx queue. To use it type the command

make queue

We also provide a Python script which report the top web pages computed by the algorithm. Use command

make report

to browse this information.

Part 1: Optimization of PageRank Iteration

In this part you will optimize computation of one PageRank iteration. Your task is to optimize the function page_rank_iteration_optimized in pagerank-turbo.cpp.

Optimization remarks

You are free to use any SIMD intrinsics, and compiler optimization options, but NOT multi-threading or CUDA.
You may use for optimization the fact that all arrays are aligned on 64 bytes.
The reference code may do redundant computations.
Use profiling to analyse performance problems in the algorithm.
Try to expose as much memory- and instruction-level parallelism as possible.

Grading

This assignment will be graded on performance.
If some part of your code fails the unit test, this part will be graded based on performance of the reference code.
If valgrind catches an error in some part of your code, this part will be graded based on performance of the reference code.
Performance of 6000 FPS or higher (as measured on Jinx compute node) guarantees A.

What to submit

Submit all changes to the code that you have made.
If you used non-standard (not g++) compiler or specified additional compiler flags, describe them in a README file. Otherwise your submission will be compiled with default parameters for grading.
Make sure your codes pass the unit tests and valgrind does not report errors in it.

Dataset

The dataset (transition matrix) for this assignment originates from UF Sparse Matrix Collection of Tim Davis. www.cise.ufl.edu/research/sparse/matrices/Gleich/wb-edu. This matrix was built by D. Gleich in 2001 by crawling .edu web sites. In the assignment you use only a small subset (16K web pages) of the oriniginal dataset.

Wiki

lab10 / Home