HTTPS SSH

mcOpt: MATLAB Source-Transformation Framework

1. About mcOpt

mcOpt is built on top of Natlab and McLab, a MATLAB Optimization Framework designed and developed by the Sable Lab at McGill University. It is developed in Java and provides an extension to the existing Natlab framework that allows to perform source-level code transformations on existing MATLAB Programs. It was put together as a research project for the McGill University class COMP 621: "Static Analysis and Transformations" offered in Winter 2012 by Professor Laurie Hendren.

2. Introduction and Motivation

MATLAB is a very popular programming language among the scientific research community for its strong numerical computing and mathematical facilities. For that reason, it is used by a vast body of users with different levels of understanding of programming. MATLAB provides its users with a number of built-in, highly optimized routines to perform computations on matrices and vectors. People coming from a low level programming background might be used to very different programming paradigms, for instance using loops to compute sums or nested loops to multiply arrays. Moreover, not all users are aware of every optimized routine that is provided for them and might re-implement some common idioms of MATLAB in a sub-efficient manner. Take for instance this small code snippet:

i = 1
for range = 0:0.01:1000
        pts(i) = sin(range)
end

which can be written much more efficiently in MATLAB as follows:

range = 0:0.01:1000
pts = sin(range)

While still yielding the same result. This code is much more efficient due to underlying library calls that MATLAB can make.

4. Specific Problem Statement

In this section we state the problem formally:

Given an input MATLAB program P, a set of known optimization patterns
S = {s_1, ..., s_n}, and a mapping function M: S -> O
mapping sub-optimal patterns to optimal patterns, find every occurrence of each s_i in P
and replace it by its counter part o in O.

5. Solution Strategy and Implementation

Our strategy is fairly straightforward. First we need a list of known optimization patterns. This list is nothing more than a collection of known MATLAB programming idioms. Such idioms vary a lot in degree of complexity and are often stated in official MATLAB documentation, or in articles about writing efficient MATLAB code.

mcOpt's solution strategy is as follows:

First, we feed the program to Natlab in order to get an Abstract Syntax Tree on which to work. Next we perform the following algorithm repeatedly until we reach a fixed point, that is, we repeatedly run the same algorithm until no new changes are introduced to the tree.

  1. for each node 'n' in the tree
  2. for each pattern 'p'
  3. 'p' matches on 'n'? replace subtree at 'n' with optimized subtree and skip remaining patterns

This algorithm draws heavily from peephole patterns used in assembly optimization.

We first implement a sturdy framework that allows to easily add and remove optimization patterns. We then implement a number of patterns as a proof of concept. These patterns are individually tested on microbenchmarks that make sure that they work as intended.

The source code for the framework is licensed under MIT (See LICENSE for details) and is freely available.

The rest of this section describes how patterns are implemented. All patterns extend the abstract super class Pattern. This class provides a number of template methods defined to make the pattern programmer's job easier. The most important method is match Which allows to match a pattern against a tree. The matching algorithm is a modified recursive tree comparison algorithm that takes into consideration the specific nature of abstract syntax tree. The main challenge was that when looking at a tree's structure, we are not interested in which identifier names are being used, so much as that they match with our expected pattern. To this end we developed a system called placeholders. A placeholder is a variable name that McOpt uses to know that anything is acceptable. Placeholders are defined implicitly on their first occurrence while matching a pattern. Every other occurrence of the placeholder is looked up in a symbol table and substituted by its value. It is expected that a the value in the actual tree and in the placeholder will be the same, otherwise comparison fails.

We will use the following notation for placeholders:

[<type>:<name>]
<type> ::= id | int | str | float
<name> ::= [a-zA-Z0-9]+

Here is a simple example of a pattern matching successfully and one failing:

Pattern: [id:ident] = [int:lhs] + [int:lhs]

Match:
Code: six = 3 + 3
Symbols: [id:ident=six] [int:lhs=3]

No Match:
Code: eleven = 3 + 8
Symbols: [id:ident=eleven] [int:lhs=3]

In the second example, we see that the pattern does not match because the second lookup of [int:lhs] compares 3 and 8, and those are different numbers. Specific values can be defined simply by using their literal values in the code.

When designing patterns programmatically, placeholder types are explicit in the tree structure, so only the name is required. Furthermore the names are internally stripped of their ':' prefix. The only case where the ':' prefix is explicitly required is when defining an identifier placeholder. For consistency, it is possible to define float, int and string placeholders with the ':' prefix as well.

The Natlab Abstract syntax tree has been extended with placeholder leaf nodes AnyInteger(String id), AnyFloat(String id), AnyString(String id). These classes can be used where a Natlab AST expects one of IntLiteralExpr, FPLiteralExpr and StringLiteralExpr respectively.

A pattern contains two AST trees: The first one is called the pattern tree and consists of a tree with placeholders, which is to be matched against a valid AST generated by Natlab from a Matlab/Natlab source file. The second tree is called the replacement tree and is a valid AST reusing (optionally) the placeholders from the pattern tree. It is important to note that using placeholders that never occurred in the pattern tree will yield and invalid replacement pattern and if the pattern matches, an error will be thrown and the optimized code will not be output.

We now describe the tree comparison algorithm, which is mostly what anyone would expect from a tree comparison algorithm, with added symbol table lookups:

compare(tree, pattern):
        Check that `tree` and `pattern` have matching types `t`
                otherwise return FALSE.
        Check that `tree` and `pattern` have the same number of children
                otherwise return FALSE.
        If `t` is a leaf type (contains data):
                Check that `tree.data` and `pattern.data` are the same
                        otherwise return FALSE.
        If the node is isomorphic (children order doesn't matter) [Unimplemented]
        For each children (`t_i`, `p_i`) pair in (`tree`, `pattern`):
                If compare(`t_i`, `p_i`) is false:
                        return FALSE.
        return TRUE.

The detailed implementation of this algorithm can be found here.

6. Experimental Framework and Results

Our experimental framework is the combination of the solution described above and a number of patterns that we now list here.

include{patterns.tex}

For each pattern, we wrote a small fragment of benchmark code for which the pattern matched. These fragments give serve two purposes:

First they demonstrate that the pattern matches and operates as it was designed to. Secondly, they allow us to compare the runtime speeds of both unoptimized and optimized code. This is representative of whether or not a pattern is useful and gives a good idea of how much performance gain can be achievable in practice.

For each pattern, we ran the non-optimized code 10 times and dropped the slowest and fastest result. We did the same for each optimized code fragment. The results are listed in the following table. The code for each benchmark is not given in this section but can be found online.

Results for Matlab:

Pattern Name No Optimize StdDev Optimized StdDev
(s) (s) (s) (s)
FastClip 0.8484 0.0090 1.1147 0.0377
RangeFloat 0.0094 2.6693e-4 0.0105 0.0023
RangeInt 0.1028 0.0018 0.1802 0.0027
VectorSum 1.0071e-4 8.5741e-5 0.0523 0.0018

Results for Octave:

Pattern Name No Optimize StdDev Optimized StdDev
(s) (s) (s) (s)
FastClip 25.559 0.767 27.377 0.662
RangeFloat 1.347 0.007 0.017 9.906e-4
RangeInt 14.027 0.7871 0.172 0.002
VectorSum 0.028 0.002 1.243 0.086

From these results we can conclude two things: First, it would appear that vectorization is not quite favorable in MATLAB. We conjecture that this might be due to MATLAB's JIT compiler optimizing the looping code a lot more. The octave results do show speedups for range vectorizations but not for the other two transforms. This leads us to conclude that not all MATLAB suggested idioms necessarily imply performance gains. These results are quite interesting as they imply a need to carefully design and test patterns before deploying them in the real world if we are to get positive results.

7. Benchmark Programs

Lastly, we tested performance gains on real life benchmarks. Those benchmarks give us two important pieces of information, first that there is a use for automating source-level transformations of real-life code, and that the transformations do not cause program malfunction "in the wild". In this section we describe the suite of benchmarks that was used and give specific details as to the machine architecture and specifications on which the benchmarks were run. Again, we ran the benchmarks 10 times with and without optimizations, dropping the fastest and lowest runs. Our results are displayed in the following table:

Benchmark No Optimize StdDev Optimized StdDev
(s) (s) (s) (s)
kmeans 5.3821 0.0577 5.3349 0.0248
deblur 47.3564 1.2678 48.9948 2.0708

As we can see, the impact of optimization was in fact negligible, hinting that our patterns are not strong enough to find real-life optimization scenarios. No significant gains were made on neither benchmarks as no pattern matched any of the optimized files.

For a specific description of the testing environment, please refer to the appendix at the end of this document.

8. Conclusion and Future Work

We see in large part that the JIT compiler is extremely powerful and nullifies our attempts to optimize code by hand. It is to be granted however that the currently implemented number of pattern and their complexity is merely a showcase of what the framework could do. In fact there is still a lot of work left to be made to make the framework more flexible and hopefully able to generate meaningful results in the future.

One lacking point of the framework at the moment is that there are a vast number of known vectorizations that can be done and a lot of room for additional pattern development.

The current framework has a number of improvements that could be made. For instance, currently patterns must be typed directly in Java, compiled and registered with the program. A goal could be to write a Domain Specific Language allowing to specify patterns in a friendly text environment, generating the Java code for pattern. In line with this would be to dynamically load patterns at runtime.

Along with this increase of patterns, some flag settings for optimization level or a way to specify which patterns to look for could be added to the framework.

Finally, the most interesting addition to the framework would be a hook system that allows to add additional user defined checks in the pattern matching. Such hooks would take the form of:

MAIN      - The main optimizer execution
ASSIGN    - Assigning a value to a placeholder
LOOKUP    - Looking up the value of a placeholder
REPLACE   - Replacing the value of a placeholder
NAME      - Visiting a Name node (variable, function)
INT       - Visiting an Integer literal node
FLOAT     - Visiting a Float literal node
STR               - Visiting a String literal node
PARAM     - Visiting a Parameterized access node
[...]

Each of those events would be specified a "BEFORE" and "AFTER" hook that can run some of its own checks. This would allow to achieve much more complicated pattern matches and thus enable a lot more flexibility in terms of what optimizations can be made. In particular, this could enable the optimizer to run a number of analyses prior to attempting any optimizations, followed by querying the results of these analyses for some patterns.

Lastly, an interesting addition would be standard wildcards for the number of occurrences. This could be used, for example, when attempting to simplify a straightforward loop of the form:

y = 0
for x = 0:0.1:100
    y(i) = log(x) + ... + sin(x)
    i = i + 1
end

For which we currently only support one occurrence on the right hand side. This could be further extended to allow Regular Expressions for Placeholders. There are unimplemented facilities in the code for this future extension.

9. Appendix

McOpt is fairly straightforward to use and currently only supports the optimizing of a single file at a time. Typing:

./McOpt

in the command line will give a list of switches and general usage information about the program, given that it has been compiled.

Compilation can be achieved by typing:

ant

in the root folder of McOpt.

9.1. Benchmark Environment

This is the environment under which the benchmarks were run:

$ less /proc/cpuinfo
                                [...]
processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Core(TM)2 Duo CPU     E8135  @ 2.40GHz
stepping        : 6
cpu MHz         : 2400.000
cache size      : 6144 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
apicid          : 1
initial apicid  : 1
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : [...]
bogomips        : 4784.44
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:
                                [...]

$ uname -a
Linux ubuntu 2.6.32-24-generic-pae #42-Ubuntu SMP [...] 2010 i686 GNU/Linux

$ free -m
             total       used       free     shared    buffers     cached
Mem:          1994       1865        129          0          9        395
-/+ buffers/cache:       1459        534
Swap:            0          0          0


$ matlab -nodisplay

             < M A T L A B (R) >
   Copyright 1984-2010 The MathWorks, Inc.
 Version 7.12.0.635 (R2011a) 32-bit (glnx86)
               March 18, 2011

% octave
GNU Octave, version 3.4.3
Copyright (C) 2011 John W. Eaton and others.
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; not even for MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE.  For details, type `warranty'.

Octave was configured for "i686-pc-linux-gnu".