Wiki

Clone wiki

Bindead / Introduction to RREIL

RREIL is an intermediate language similar to RTLs used in compiler backends as well as in binary analysis tools Cifuentes et al. We use RREIL as intermediate representation to model native code instructions of various architectures. One of the main obstacles in the definition of an intermediate language suitable for modeling complex instruction sets (and complex instructions themselves) are implicit side effects of native machine code instructions. For example, on the Intel x86 platform, a simple addition instruction does not only compute the sum of its two arguments, but also modifies the processor's internal state, in this case the contents of the EFLAG register. Furthermore, the complexity of the instruction set itself makes it very tedious to write analyses directly for native instructions. Instead the native instructions are translated to a simple enough intermediate representation suitable for architecture independent code analyses. RREIL is a descendant of REIL augmented with relational operations and "type-safe" binary operations.

RREIL Instructions

RREIL instructions are conveniently written as 3-address-code. We distinguish between input and output parameters using lvals and rvals respectively. Offsets into register variables are written id:sz/o, where o is the offset in bits counted from the lowest bit in id. The size sz at which register id is accessed at offset o is given after the colon. For example a 32bit addition could be written as follows:

#!text
    add x:32/0, y:32/0, z:32/0     ; x = y + z mod 2^32
     |                  | |  |
     |                  | |  `---- offset
     |                  | `------- size
     |                  `--------- memory region identifier
     `---------------------------- operation mnemonic
One difference of RREIL to other intermediate representations is, that registers are just treated as plain memory regions. There is no distinct notation for registers and memory.

Formal Definition of RREIL

We commence by presenting the RREIL instructions that concern the read and write access of a memory cell, a conditional branch, a function call and return instruction (which are jumps semantically). In all statements, the destination is the left-most parameter, following the Intel syntax.

#!text
    rreil ::=
      | base-stmt
      | load lval, lval
      | store lval, rval
      | brc condition, rval
      | call rval
      | return rval
      | nop
      | halt
The right-hand sides of each instruction use the definition of a writable location lval and an rval, i.e. an expression or interval where Z∞ = Z∪{−∞, +∞}. We use the term memory region for id ∈ M which encompasses the set of registers and temporary variables that are generated during translations. Each memory region carries the size of the access which must be the same across an instruction. One of two exceptions to this rule are flags that are always one bit wide. The other being the dereference instructions.
#!text
    condition ::= flag | c           c ∈ N
         lval ::= var
         rval ::= var | [l, u]:sz    l,u ∈ Z∞
          var ::= id:sz(/sz)?        id ∈ M
         flag ::= id:1               id ∈ M
           sz ::= n                  n ∈ N
These definitions are also used in the rules for arithmetic instructions and comparisons which are summarized in the base-stmt production, presented next. The base-stmt production encompasses binary operations, comparisons, sign extension and the copying of values. The binary instructions must be applied to arguments of equal size. Only the arguments of sign-extend feature different sizes. The comparison statements test for equality, for “less or equal” on unsigned and signed values and for “less than” on unsigned and signed values, respectively. Both arguments have the same size. "Greater than" comparisons are translated by swapping the arguments.
#!text
    base-stmt ::=
      | binop lval, rval, rval
      | cmpop flag, rval, rval
      | sign-extend lval, rval
      | convert lval, rval
      | mov lval, rval
    binop ::= add | sub | mul | divu | divs | shl | shru | shrs | mod | and | or | xor
    cmpop ::= cmpeq | cmpleu | cmples | cmpltu | cmplts
While the ~25 RREIL instructions make for a concise language, the previous REIL language was even simpler, featuring only 17 instructions. However, the added instructions in RREIL are crucial to improve the analysis precision.

Translation of Comparisons

One of the benefits of our RREIL language is that flag computations can be translated into arithmetic instructions if possible. For instance, the x86-32 comparison cmp eax, ebx followed by jl target which jumps if eax is signed less than ebx translates into RREIL as follows:

    00: sub t0:32, eax:32, ebx:32
    01: cmpltu CF:1, eax:32, ebx:32
    02: cmpleu CForZF:1, eax:32, ebx:32
    03: cmplts SFxorOF:1, eax:32, ebx:32
    04: cmples SFxorOForZF:1, eax:32, ebx:32
    05: cmpeq ZF:1, eax:32, ebx:32
    06: cmplts SF:1, t0:32, 0:32
    07: xor OF:1, SFxorOF:1, SF:1
    08: brc SFxorOF:1, target:32
The comparisons in lines 1 to 6 test for <, ≤ of unsigned values, <, ≤ for signed values, equality and if the result is negative, respectively. The translation creates what we call virtual flags, that is, flags that are not present in the processor but which express relational information that can be conveyed to numeric domains. For instance, SFxorOF is set iff eax < ebx when eax and ebx are interpreted as signed integers. The benefit of this approach is that each arithmetic test is associated with one flag and once the conditional branch is reached, the test matching the flag can be applied to the numeric abstract domain. This means that an automatic forward substitution of the condition is performed. Observe that the overflow flag OF carries no meaningful arithmetic information but is still calculated to allow bit-level analyses such as SAT solving.

Note that the unsatisfactory translation of relational tests (as bit manipulations on values) in REIL is only one reason for our new intermediate language: RREIL. Two other differences are that the arguments of an instruction, e.g. sub, cmpltu must have the same size and that they can be accessed at an offset. This implies that our analysis cannot analyze REIL code without first translating from REIL to RREIL. Thus, RREIL is not an extension of REIL; rather, it is a new IL whose features are carefully chosen to allow for a precise numeric analysis of machine code.

Updated