Clone wiki

Lab8 / Home

CSE 6230, Fall 2013: Lab 8, Th Oct 24: SIMD for Image Processing and Computer Vision

In this lab you will practice profiling and compiler-assisted optimization on a real-world program. The lab will consist of two parts, and there is not distinction between in-class and out-of-class parts. You can also perform them in any order. In this lab you will optimize two image manipulation functions using SIMD intrinsics.

You may if you wish work in teams of two. To simplify our grading of your assignments, each person should submit his/her own assignment; however, all team members may submit identical code. Be sure to indicate with whom you worked by creating a README file as part of your submission.

Part 0: Getting started

Execute the following command to setup your environment and get the recent gcc (4.8.1), clang (3.3), and valgrind:

source /nethome/mdukhan3/install/

Fork the starting code for this lab and clone to get a local copy of repository.

The starting code implements naive versions of the algorithms to optimize, unit, and performance tests.

Part 1: Optimization of Image Integration

In this part you will optimize computation of integral image. Integral image is a modification of the source image where the value of each pixel is computed as the sum of values of all pixels to the left and to the top of its position. Integral image allows to compute the sum of pixels in any rectangular area using only four memory references, which is useful for computer vision algorithms.

Your task is to optimize the function integrate_image_optimized in image_simd.cpp. You are free to use any SIMD intrinsics, and compiler auto-vectorization options, but NOT multi-threading or CUDA.

Part 2: Optimization of Interleaved RGB to Grayscale Conversion

In this part we ask you to optimize conversion from interleaved RGB format to grayscale image representation.

Your task is to optimize the function convert_rgb_to_grayscale_optimized in image_simd.cpp. You are free to use any SIMD intrinsics, and compiler auto-vectorization options, but NOT multi-threading or CUDA.

Performance target

3000 or higher FPS (as measured on Jinx-login) guarantees A.

You may find useful for optimization the following facts:

  • Image width is a multiple of 8
  • Image height is a multiple of 2
  • Image buffers are aligned on 64 bytes

What to submit

  • Submit all changes to the code that you have made.
  • If you used non-standard (not g++) compiler or specified additional compiler flags, describe them in a README file. Otherwise your submission will be compiled with default parameters for grading.
  • Make sure your codes pass the unit tests and do not access the memory beyond array bounds (you can use valgrind to check that). If your optimized implementation fails the unit test or reports memory access errors in valgrind, default implementation will be used for grading.