CSE 6230, Fall 2013: Lab 5, Th Sep 26: Profiling
- This lab: http://j.mp/gtcse6230fa13lab5
- Info on the Jinx cluster: http://support.cc.gatech.edu/facilities/instructional-labs/jinx-cluster
In this lab you will practice profiling and compiler-assisted optimization on a real-world program. The lab will consist of two parts, and will be done in class, but we will also post home assignment. In this lab you will profile and optimize ImageMagick, a library for image transformation, convertion, and manipulation.
You may if you wish work in teams of two. To simplify our grading of your assignments, each person should submit his/her own assignment; however, all team members may submit identical code. Be sure to indicate with whom you worked by creating a README file as part of your submission. (See below for details.)
Please try to use Jinx compute nodes for this assignment (use
qsub -I -q class_long to get a node in interactive mode).
Part 0: Getting started
Different nodes of Jinx cluster might have different processors, so it is important that you specify the processor on your Jinx node. You may get the information about the processor from the file
/proc/cpuinfo. Execute the command
cat /proc/cpuinfo, look at the value of
model name and answer the question:
- What is the name of the processor which you used for this assignment?
Execute the following command to setup your environment and get the recent gcc (4.8.1), clang (3.3), and valgrind:
Create a directory ProfilingLab in your home directory on Jinx:
Navigate to ProfilingLab directory:
Download sources for ImageMagick:
Unpack the source archive:
tar -xjf ImageMagick-6.8.7-0.tar.bz2
Navigate to unpacked ImageMagick directory:
Run the configure script which detects compilers and libraries available on Jinx and sets the compilation settings accordingly:
./configure --prefix=$HOME/ProfilingLab --disable-shared --disable-openmp --without-bzlib --without-dps --without-djvu --without-jbig --without-jp2 --without-lcms --without-lcms2 --without-lqr --without-lzma --without-openexr --without-pango --without-tiff --without-webp --without-xml
--prefix=$HOME/ProfilingLab parameter specifies that the library (and acompanying tools) must be installed to
ProfilingLab folder in your home directory,
--disable-openmp disables multithreading for this build, and other parameters disable ImageMagick features which we don't need for this assignment. To get the list of all supported parameters run
After the script completes configuration, build the program with the command
-j8 parameter means that make should use 8 threads (jobs) for building.
The build finished execute
make install to install the library and acompanying tools to
Now navigate back to
ProfilingLab directory. We will use
convert utility which is installed to
bin/convert --version to make sure that it was built and installed correctly. You should see the banner as below:
Version: ImageMagick 6.8.7-0 2013-09-26 Q16 http://www.imagemagick.org Copyright: Copyright (C) 1999-2013 ImageMagick Studio LLC Features: DPC Delegates: fftw fontconfig freetype jng jpeg png png x zlib
Download the image sample (this is a cat photo created by Stephan "Macphreak" Brunet an freely available on Wikimedia Commons) which we will you for this assignment:
cd .. # Go up one directory wget http://upload.wikimedia.org/wikipedia/commons/1/10/Louis-%26-Chanel-taking-a-nap.jpg -O input.jpg
In this assignment you will use the
convert utility to:
- Load JPG image
- Blur the image
- Transform it to grayscale
- Save as PNG
To do all of that run the utility with the following parameters:
bin/convert -blur 15x15 -colorspace gray input.jpg output.png
You should get a blurred grayscale version of
input.jpg in the file
Part 1: Profiling
Run the conversion (probably multiple times) under
perf stat utility. Answer the questions:
- What is the IPC of the utility in our use case? Is it good?
- What is the fraction of mispredicted branches? Is it acceptable?
- What is the rate of cache misses? How do you feel about this number?
Now run the use case again under
perf record. After it finishes, use
perf report to browse the results. Answer the questions:
- Which two functions take most of the execution time? What do they do?
Part 2: Compiler Optimizations
Measure the time it takes the
convert utility to do the processing:
time bin/convert ...
Answer the question: What is the "User Time" for program execution before you start optimizing?*
In the rest of this lab you will have to apply profile-guided optimization to ImageMagick.
Navigate back to ImageMagick source directory:
First, you need to delete all products and temporaries from previous build. Run
make clean to achieve that.
Next, you will have to reconfigure ImageMagick by running
./configure script again, but with additional parameters. You may specify additional options for C compiler via variable
CFLAGS, and additional options for linker via
LDFLAGS variable. E.g. to compile with
-fprofile-generate option (which must be specified for both compiler and linker), run:
./configure ... CFLAGS="-fprofile-generate" LDFLAGS="-fprofile-generate"
(You may also ask
configure to use a different C compiler via
CC variable, i.e.
./configure ... CC=icc will configure the build to use Intel C compiler)
Using profile-guided optimization with GCC involves three steps:
- Build the program with
-fprofile-generate(both for C compiler and linker).
- Run the program on representative inputs.
- Build the program again with
-fprofile-use(both for C compiler and linker).
Apply these steps to ImageMagick to get a profiled version of the program. Run the image conversion again under
time utility and answer the question:
- What is the "User Time" for program execution after you completed all three steps and rune the program with
Part 3: More Compiler Optimizations (out-of-class, due October 3rd 4:30 PM)
In Part 2 you tried specific compiler optimization (Profile-Guided Optimization). In this part you can try any compiler optimizations except OpenMP (i.e. keep the configuration parameter
--disable-openmp as is, and do NOT add
CFLAGS) in order to achieve maximum performance.
Besides the compiler options from lecture you may also consider fine grained optimization options from compilers' documentation:
- GCC Optimization options (also compatible with clang).
- Use command
man iccfor reference on Intel Compiler options.
What to submit
Add to your repository the following files:
- Your compiled
convertbinary (you will find it in
- A readme file (in plain text or Markdown format) detailing CC, CFLAGS, and LDFLAGS parameters and explaining the compiler optimizations you used and achieved performance.
To get A for this assignment you need to reduce the execution time to 3.2 secs or lower (as measured on Jinx-login)