Wiki
Clone wikiBindead / Introduction to RREIL
RREIL is an intermediate language similar to RTLs used in compiler
backends as well as in binary analysis tools Cifuentes et al. We use RREIL
as intermediate representation to model native code instructions of various
architectures. One of the main obstacles in the definition of an intermediate
language suitable for modeling complex instruction sets (and complex
instructions themselves) are implicit side effects of native machine code
instructions. For example, on the Intel x86 platform, a simple addition instruction
does not only compute the sum of its two arguments, but also modifies the
processor's internal state, in this case the contents of the EFLAG
register.
Furthermore, the complexity of the instruction set itself makes it very tedious
to write analyses directly for native instructions. Instead the native instructions are
translated to a simple enough intermediate representation suitable for
architecture independent code analyses. RREIL is a descendant of REIL
augmented with relational operations and "type-safe" binary operations.
RREIL Instructions
RREIL instructions are conveniently written as 3-address-code. We distinguish
between input and output parameters using lvals and rvals
respectively. Offsets into register variables are written id:sz/o
, where o
is the offset in bits counted from the lowest bit in id. The size
sz at which register id is accessed at offset o is given after the
colon. For example a 32bit addition could be written as follows:
#!text add x:32/0, y:32/0, z:32/0 ; x = y + z mod 2^32 | | | | | | | `---- offset | | `------- size | `--------- memory region identifier `---------------------------- operation mnemonic
Formal Definition of RREIL
We commence by presenting the RREIL instructions that concern the read and write access of a memory cell, a conditional branch, a function call and return instruction (which are jumps semantically). In all statements, the destination is the left-most parameter, following the Intel syntax.
#!text rreil ::= | base-stmt | load lval, lval | store lval, rval | brc condition, rval | call rval | return rval | nop | halt
#!text condition ::= flag | c c ∈ N lval ::= var rval ::= var | [l, u]:sz l,u ∈ Z∞ var ::= id:sz(/sz)? id ∈ M flag ::= id:1 id ∈ M sz ::= n n ∈ N
base-stmt
production, presented next. The base-stmt
production encompasses binary operations, comparisons, sign extension
and the copying of values. The binary instructions must be applied to
arguments of equal size. Only the arguments of sign-extend feature different
sizes. The comparison statements test for equality, for “less or equal” on
unsigned and signed values and for “less than” on unsigned and signed values,
respectively. Both arguments have the same size. "Greater than" comparisons
are translated by swapping the arguments.
#!text base-stmt ::= | binop lval, rval, rval | cmpop flag, rval, rval | sign-extend lval, rval | convert lval, rval | mov lval, rval binop ::= add | sub | mul | divu | divs | shl | shru | shrs | mod | and | or | xor cmpop ::= cmpeq | cmpleu | cmples | cmpltu | cmplts
Translation of Comparisons
One of the benefits of our RREIL language is that flag computations
can be translated into arithmetic instructions if possible. For instance, the
x86-32 comparison cmp eax, ebx
followed by jl target
which jumps if
eax
is signed less than ebx
translates into RREIL as follows:
00: sub t0:32, eax:32, ebx:32 01: cmpltu CF:1, eax:32, ebx:32 02: cmpleu CForZF:1, eax:32, ebx:32 03: cmplts SFxorOF:1, eax:32, ebx:32 04: cmples SFxorOForZF:1, eax:32, ebx:32 05: cmpeq ZF:1, eax:32, ebx:32 06: cmplts SF:1, t0:32, 0:32 07: xor OF:1, SFxorOF:1, SF:1 08: brc SFxorOF:1, target:32
Note that the unsatisfactory translation of relational tests (as bit manipulations on values) in REIL is
only one reason for our new intermediate language: RREIL. Two other differences are that
the arguments of an instruction, e.g. sub
, cmpltu
must have the
same size and that they can be accessed at an offset. This implies that our analysis cannot
analyze REIL code without first translating from REIL to RREIL.
Thus, RREIL is not an extension of REIL; rather, it
is a new IL whose features are carefully chosen to allow for a precise numeric
analysis of machine code.
Updated