HTTPS SSH

PL241-Compiler

The PL241-Compiler is an SSA-based optimizing compiler that supports copy propagation, common subexpression elimination, constant folding, register allocation and code generation for the DLX (pronounced ‘Deluxe’) RISC processor architecture.

This was developed as a graduate-level class project at UC Irvine, under the tutelage of Prof. Michael Franz. This implementation was made public only after the awarding of the final grade for the graduate course.

EBNF Grammar for PL241

letter = “a” | “b” | … | “z”. 
digit = “0” | “1” | … | “9”. 
relOp = “==“ | “!=“ | “<“ | “<=“ | “>“ | “>=“. 

ident = letter {letter | digit}. 
number = digit {digit}. 

designator = ident{ "[" expression "]" }. 
factor = designator | number | “(“ expression “)” | funcCall . 
term = factor { (“*” | “/”) factor}. 
expression = term {(“+” | “-”) term}. 
relation = expression relOp expression . 

assignment = “let” designator “<-” expression. 
funcCall = “call” ident [ “(“ [expression { “,” expression } ] “)” ]. 
ifStatement = “if” relation “then” statSequence [ “else” statSequence ] “fi”. 
whileStatement = “while” relation “do” StatSequence “od”. 
returnStatement = “return” [ expression ] . 

statement = assignment | funcCall | ifStatement | whileStatement | returnStatement. 
statSequence = statement { “;” statement }. 

typeDecl = “var” | “array” “[“ number “]” { “[“ number “]” }. 
varDecl = typeDecl indent { “,” ident } “;” . 
funcDecl = (“function” | “procedure”) ident [formalParam] “;” funcBody “;” . 
formalParam = “(“ [ident { “,” ident }] “)” . 
funcBody = { varDecl }{” [ statSequence ] “}”. 

computation = “main” { varDecl } { funcDecl }{” statSequence “}” “.” .

Available IR Instructions

NEG, // neg x unary minus  
ADD, // add x y addition  
SUB, // sub x y subtraction  
MUL, // mul x y multiplication  
DIV, // div x y division  
CMP, // cmp x y comparison      
ADDA, // adda x y add two addresses x und y (used only with arrays)  
LOAD, // load y load from memory address y  
STORE, // store y x store y to memory address x  
MOVE, // move y x assign x := y  
PHI, // phi x x1 x2 ... x := Phi(x1, x2, x3, ...) 
END, // end end of program 
BRA, // bra y branch to y  
BNE, // bne x y branch to y on x not equal  
BEQ, // beq x y branch to y on x equal  
BLE, // ble x y branch to y on x less or equal  
BLT, // blt x y branch to y on x less  
BGE, // bge x y branch to y on x greater or equal  
BGT, // bgt x y branch to y on x greater  
READ, // read read  
WRITE, // write write  
WLN, // wln writeLn 

IR Generation:

letter = “a” | “b” | … | “z”. 
digit = “0” | “1” | … | “9”. 

Does not figure in the parser.

relOp = “==“ | “!=“ | “<“ | “<=“ | “>“ | “>=“.

ident = letter {letter | digit}. 
number = digit {digit}. 

translates to functions for symbol lookup.

designator = ident{ "[" expression "]" }. Handle for arrays.

factor = designator | number | “(“ expression “)” | funcCall .

term = factor { (“*” | “/”) factor}. Handle the case of divide by zero

expression = term {(“+” | “-”) term}. Handle the case of negetive numbers in Result.

relation = expression relOp expression .

assignment = “let” designator “<-” expression. 
// translates to 
MOVE designator expression
funcCall = “call” ident [ “(“ [expression { “,” expression } ] “)” ]. 
// translates to
WRITE expression
{WRITE expression}

In order to simulate the result of the return, Regsiter.RET() is returned.

ifStatement = “if” relation “then” statSequence [ “else” statSequence ] “fi”. 
whileStatement = “while” relation “do” StatSequence “od”. 
returnStatement = “return” [ expression ] . 

statement = assignment | funcCall | ifStatement | whileStatement | returnStatement. 
statSequence = statement { “;” statement }. 

typeDecl = “var” | “array” “[“ number “]” { “[“ number “]” }. 
varDecl = typeDecl indent { “,” ident } “;” . 
funcDecl = (“function” | “procedure”) ident [formalParam] “;” funcBody “;” . 
formalParam = “(“ [ident { “,” ident }] “)” . 
funcBody = { varDecl }{” [ statSequence ] “}”. 

computation = “main” { varDecl } { funcDecl }{” statSequence “}” “.” .

[TODO] Instruction Schduling:

pipelined execution

i: F|D|IN|X...
j:   F|D|IN|X...

but if j requires i, we will need to the result of instruction i. then we have a pipeline stall.

i: F|D|IN|X.......|
j:   F|...stall...|D|IN|X...

Thus, we assume worst case.

i: write [Rx]
j: read [Ry]
k: muli R2, #42

reorder instructions in your program, in such that the stalls are minimized.

i: write [Rx]
j: read [Ry]
k: muli R2, #42

Instruction Scheduling: thus, we can move k, above j or even i, because it does not depend on any of them.

  • Build a model of how the instructions are dependent on each other.
  • Reorder, while observing the constraints.

Constraints:

Kinds of dependencies...

a -> b; // this means that instruction a must be executed before instruction b.

  1. def-use dependency. ... a produces a result that is used by b.
  2. use-def anti-dependence. ... a uses a result, that is then the result is defined.
  3. def-def. ... a: mov ... -> x; b: move ... -> x;

expression: x + y - z + a[i] + b[i]

add - 1 cycle mul - 2 cycles load - 3 cycles

                            .0      .
1. add x y                  .1      .
2. sub (1) x                .2      .
3. muli i #4                .4      .(1)
4. add FP a-baseaddr        .5      .
5. adda (3) (4)             .       .(3)
6. load (5)                 .8
7. add (2) (6)              .9
8. add FP b-baseadr         .10
9. adda (3) (8)             .       .(2)
10. load (9)                .13
11. add (7) (10)            .14

Build a DAG:

Roots: instructions that depend only on things outside of current basic block.

1. add x y
2. sub (1) x
3. muli i #4
4. add FP a-baseaddr
5. adda (3) (4)
6. load (5)
7. add (2) (6)
8. add FP b-baseadr
9. adda (3) (8)
10. load (9)
11. add (7) (10)
1 -> 2
4 -> {5, 6}
3 -> {5, 6}
2 -> 7
{5, 6} -> 7
3 -> {9, 10}
8 -> {9, 10}
{9, 10} -> 11
7 -> 11
  • use topological sorting.
  • every correct topological sort ordering is a valid schedule.
  • however, which of them is the best one? ... well, topological sorting is NP-hard.
  • so use heuristics.

"Path Length" Heuristic

choose the instruction with the longest remaining path; may be able to "tuck" in other instructions during stalls along this path.

insn#: {cycles}
1: {4}
2: {3}
3: {7}
4: {6}
5,6: {}
7: {2}
8: {5}
9,10: {4}
11: {1}

Algorithm: simulates the processor.

for every instruction: mark time when ready. (roots are always ready)
    if(instructions are ready)
        we are going to schedule the one with the highest remaining path.
    else
        we will advance the simulated time to the earliest start time to any of our instructions.
time    instruction:        cycles:
0:      3: muli i #4        |_|_|   
1:      4: add FP abasadr     |_|_|
2:      {5,6}                     |_|_|_|
3:      8                           |_|
4:      {9,10}                        |_|_|_|
5:      1                               |_|
6:      2                         
7:      7
8:      11

note: we schedule {5,6} over 8, because they use mor registers.

Path length heuristic: modern processors: loads are sloooow. as a result, all loads are moved up. and hence register pressure kills us.

Oversheduling:

instead of scheduling from the top, i.e. from the roots (leaves), in the topological sort, do it from the bottom.

scheduling frm the bottom, as soon as the latency between a use and a load has been covered should place load instructions "further down" in schedule and relieve register pressure.

Scheduling instructions via control stucture tree, - innermost first - handle whole control structure like a single instruction using N cycles. - Question: how many cycles does a loop or if take?

Trace Scheduling Loop unrolling

Data and Control Speculation


[Done] Building the Inference Graph.

BUILDINTERVALS
    for each block b in reverse order do
        live = union of successor.liveIn for each successor of b

        for each phi function phi of successors of b do
            live.add(phi.inputOf(b))

        for each opd in live do
            // intervals[opd].addRange(b.from, b.to)

        for each operation op of b in reverse order do
            for each output operand opd of op do
                // intervals[opd].setFrom(op.id)
                live.remove(opd)
            for each input operand opd of op do
                // intervals[opd].addRange(b.from, op.id)
                live.add(opd)

        for each phi function phi of b do
            live.remove(phi.output)

        if b is loop header then
            loopEnd = last block of the loop starting at b
            for each opd in live do
                // intervals[opd].addRange(b.from, loopEnd.to)

        b.liveIn = live

COALESCING during register allocation.

Coalescing in an inference graph is generally done to get rid of MOV instructions. While this might not seem like an issue for this compiler as Copy Poropogation gets rid of the MOV instructions, Coalescing is still required due the presence of PHI instructions. PHI instructions translate to MOV instructions ultimately and hence their arguments must be coalesced with the instruction itself.


Questions/Doubts:

  • Why is removing emptyBasicBlocks important?
  • Why can't we just let them be there, while going about generating code?

Procedure Calls

The major activities associate with a procedure call include:

  1. Ensuring that the procedure has been defined, and that this definition agrees with the type of the procedure (proc/func) and that the number of arguments and type of arguments agree with the formal parameter spec in the procedure definition.
  2. Loading the arguments.
  3. Loading the return address.
  4. Transferring to the produce body.

Return statements and Procedure Termination.

The main semantic activities associated with procedure terminations, whether by explicit return statements or "falling off the end" of a procedure/function body are as follows (not necessarily in that order):

  1. If the procedure is a func, emit instructions to store the value to be returned to the RET i.e. Register 31.
  2. Emit an instruction to make an unconditional jump to the return address of the procedure.
  3. Emit the required instruction, i.e. ADD SP .. , to "deallocate" the activation record from runtime stack.