Clone wiki

lab5 / Home

CSE 6230, Fall 2013: Lab 5, Th Sep 26: Profiling

In this lab you will practice profiling and compiler-assisted optimization on a real-world program. The lab will consist of two parts, and will be done in class, but we will also post home assignment. In this lab you will profile and optimize ImageMagick, a library for image transformation, convertion, and manipulation.

You may if you wish work in teams of two. To simplify our grading of your assignments, each person should submit his/her own assignment; however, all team members may submit identical code. Be sure to indicate with whom you worked by creating a README file as part of your submission. (See below for details.)

Please try to use Jinx compute nodes for this assignment (use qsub -I -q class_long to get a node in interactive mode).

Part 0: Getting started

Different nodes of Jinx cluster might have different processors, so it is important that you specify the processor on your Jinx node. You may get the information about the processor from the file /proc/cpuinfo. Execute the command cat /proc/cpuinfo, look at the value of model name and answer the question:

  • What is the name of the processor which you used for this assignment?

Execute the following command to setup your environment and get the recent gcc (4.8.1), clang (3.3), and valgrind:

source /nethome/mdukhan3/install/envvars.sh

Create a directory ProfilingLab in your home directory on Jinx:

mkdir ProfilingLab

Navigate to ProfilingLab directory:

cd ProfilingLab

Download sources for ImageMagick:

wget http://www.imagemagick.org/download/ImageMagick-6.8.7-0.tar.bz2

Unpack the source archive:

tar -xjf ImageMagick-6.8.7-0.tar.bz2

Navigate to unpacked ImageMagick directory:

cd ImageMagick-6.8.7-0

Run the configure script which detects compilers and libraries available on Jinx and sets the compilation settings accordingly:

./configure --prefix=$HOME/ProfilingLab --disable-shared --disable-openmp --without-bzlib --without-dps --without-djvu --without-jbig --without-jp2 --without-lcms --without-lcms2 --without-lqr --without-lzma --without-openexr --without-pango --without-tiff --without-webp --without-xml

The --prefix=$HOME/ProfilingLab parameter specifies that the library (and acompanying tools) must be installed to ProfilingLab folder in your home directory, --disable-openmp disables multithreading for this build, and other parameters disable ImageMagick features which we don't need for this assignment. To get the list of all supported parameters run ./configure --help.

After the script completes configuration, build the program with the command

make -j8

The -j8 parameter means that make should use 8 threads (jobs) for building.

The build finished execute make install to install the library and acompanying tools to $HOME/ProfilingLab.

Now navigate back to ProfilingLab directory. We will use convert utility which is installed to $HOME/ProfilingLab/bin. Run bin/convert --version to make sure that it was built and installed correctly. You should see the banner as below:

Version: ImageMagick 6.8.7-0 2013-09-26 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2013 ImageMagick Studio LLC
Features: DPC
Delegates: fftw fontconfig freetype jng jpeg png png x zlib

Download the image sample (this is a cat photo created by Stephan "Macphreak" Brunet an freely available on Wikimedia Commons) which we will you for this assignment:

cd ..    # Go up one directory
wget http://upload.wikimedia.org/wikipedia/commons/1/10/Louis-%26-Chanel-taking-a-nap.jpg -O input.jpg

In this assignment you will use the convert utility to:

  • Load JPG image
  • Blur the image
  • Transform it to grayscale
  • Save as PNG

To do all of that run the utility with the following parameters:

bin/convert -blur 15x15 -colorspace gray input.jpg output.png

You should get a blurred grayscale version of input.jpg in the file output.jpg.

Part 1: Profiling

Run the conversion (probably multiple times) under perf stat utility. Answer the questions:

  • What is the IPC of the utility in our use case? Is it good?
  • What is the fraction of mispredicted branches? Is it acceptable?
  • What is the rate of cache misses? How do you feel about this number?

Now run the use case again under perf record. After it finishes, use perf report to browse the results. Answer the questions:

  • Which two functions take most of the execution time? What do they do?

Part 2: Compiler Optimizations

Measure the time it takes the convert utility to do the processing:

time bin/convert ...

Answer the question: What is the "User Time" for program execution before you start optimizing?*

In the rest of this lab you will have to apply profile-guided optimization to ImageMagick.

Navigate back to ImageMagick source directory:

cd ImageMagick-6.8.7-0

First, you need to delete all products and temporaries from previous build. Run make clean to achieve that.

Next, you will have to reconfigure ImageMagick by running ./configure script again, but with additional parameters. You may specify additional options for C compiler via variable CFLAGS, and additional options for linker via LDFLAGS variable. E.g. to compile with -fprofile-generate option (which must be specified for both compiler and linker), run:

./configure ... CFLAGS="-fprofile-generate" LDFLAGS="-fprofile-generate"

(You may also ask configure to use a different C compiler via CC variable, i.e. ./configure ... CC=icc will configure the build to use Intel C compiler)

Using profile-guided optimization with GCC involves three steps:

  1. Build the program with -fprofile-generate (both for C compiler and linker).
  2. Run the program on representative inputs.
  3. Build the program again with -fprofile-use (both for C compiler and linker).

Apply these steps to ImageMagick to get a profiled version of the program. Run the image conversion again under time utility and answer the question:

  • What is the "User Time" for program execution after you completed all three steps and rune the program with -fprofile-use?

Part 3: More Compiler Optimizations (out-of-class, due October 3rd 4:30 PM)

In Part 2 you tried specific compiler optimization (Profile-Guided Optimization). In this part you can try any compiler optimizations except OpenMP (i.e. keep the configuration parameter --disable-openmp as is, and do NOT add -fopenmp or -openmp to CFLAGS) in order to achieve maximum performance.

Besides the compiler options from lecture you may also consider fine grained optimization options from compilers' documentation:

What to submit

Add to your repository the following files:

  • Your compiled convert binary (you will find it in $HOME/ProfilingLab/ImageMagick-6.8.7-0/utilities/convert)
  • A readme file (in plain text or Markdown format) detailing CC, CFLAGS, and LDFLAGS parameters and explaining the compiler optimizations you used and achieved performance.

Performance targets

To get A for this assignment you need to reduce the execution time to 3.2 secs or lower (as measured on Jinx-login)

Updated