Bindead - a static analysis tool for binaries.

Bindead is an analyzer for executable machine code. It features a disassembler that translates machine code bits into an assembler like language (RREIL) that in turn is then analyzed by the static analysis component using abstract interpretation. As Bindead operates on the machine code level, it can be used without having the source code of the program to be analyzed. In fact, the purpose of Bindead is to help with the reverse engineering of executable code or binaries. The analyzer enables one to reason about all the possible runtime behavior of a program and find potential bugs. To this end we perform a collection of (numeric and symbolic) analyses on the program and are able to infer memory access bounds and various other numeric properties statically, i.e. without running the program.

Components of the analyzer:

Disassembler

The disassembler part is designed around the idea of a modular front-end that produces RREIL code from binary code blobs. It uses pluggable front-ends for the input data (file parsing) and the disassemblers/translators.

Binary file formats: We are able to parse and extract code sections from Linux (ELF) and Windows (PE) executables and our own RREIL assembler text files. The input file format for binary code chunks though is built around a simple interface and plugins can easily be added for any other file format. Basically any collection of chunks of bytes can be fed to the analyzer if wrapped in our simplified binary file format.

Disassembler front-ends: To analyze the code we currently use a hand-written disassembler that translates the binary representation of machine code (chunks of bytes) into the intermediate language RREIL that is then passed on to the analyzer. We can thus use different disassembler front-ends for different architectures with the same code analyzer. Currently we have disassemblers for Intel x86-32 and x86-64 and for the AVR-8 architecture.
As writing front-ends by hand proved to be tedious and error prone and after discussing the issue with people from the reverse engineering community the consensus was that there should be a simpler way to build and share descriptions for machine instructions. Hence, the Generic Decoder Specification Language Toolkit GDSL was started. The project allows to specify the syntax and semantics of machine instructions in a domain specific language that was designed for exactly this purpose. GDSL compiles the instructions specification using one of the available language binding back-ends to a disassembler library that can be plugged into other projects. Currently GDSL comes with disassembler specifications for the Intel x86-32 and x86-64 and the AVR-8 architecture, thus replacing the hand written front-ends of the Bindead analyzer.

Disassembling process: The static code analyzer together with the disassembler front-ends comprise a recursive descent disassembler. It reconstructs the control flow graph (CFG) of the code while disassembling and analyzing the binary. Contrary to most common disassemblers (e.g. objdump) that perform a linear sweep over the code, because of the recursive disassembly method and our static analysis we are able to follow the control flow of the code (e.g. indirect jumps and calls). Thus our disassembler usually produces less disassembly errors (wrongly decoded instructions), discovers more code (code vs. data problem) and is more robust when dealing with obfuscated code. However, an in depth comparison with existing disassemblers given all these problem cases has not been performed. The above statement is thus about the theoretical implications of our disassembler architecture and it depends on the performance and precision of the static analyzer component.

Intermediate Language (RREIL)

The static analyzer uses an intermediate language that was designed to be minimal, concise and suitable for numeric analyses. The language is inspired by the ideas that went into Zynamics' REIL adding revisions for some shortcomings of REIL. For example by exposing comparisons explicitly the memory bounds analysis can be greatly improved. Hence the name RREIL for Relational REIL where REIL is an acronym for Reverse Engineering Intermediate Language. More information on the features of the language and its semantics can be found on the RREIL introduction page and in the PhD Thesis about the analyzer or the WCRE 11 paper.

In addition to the disassembler front-ends that translate binary code to RREIL we also have an assembler to RREIL. This allows us to write test examples for the analyzer directly in RREIL without the necessity to first compile code and disassemble it.

Abstract Domains

The main part of the analyzer is the static analysis part that employs abstract interpretation with a stack of so-called abstract domains (numeric and symbolic) to infer program properties and variable values for each program point. The diagram below gives an overview of the analyzer with an emphasis on the stack of abstract domains. The main point is that the domains operate on an input language and the language becomes simpler for the bottom of the domains stack, e.g. pointers and variable sizes do not exist at the lowest level anymore. The modular design makes it fairly easy to try out new analyses due to the simplification of concerns mentioned above. As the interfaces between the domains are only defined by the input language it is easy to plug in new domains that solve a special and constrained problem.

A short description of the abstract domains that we use can be found on this page. It details the motivation and use case of each domain. A more detailed description of the domains and how they interoperate can be found in the PhD Thesis about the analyzer. The thesis and the conference publications (in the publications section at the bottom) additionally show the problems we faced when analyzing machine code and how we solved these problems using novel abstract domains.

Dynamic + Static Analysis

As an extension to the static analyzer we built a component that runs a program and outputs a trace of the run that can then be fed to the analyzer and the abstract domains. The idea behind it is to have a dynamically started static analysis, i.e. bootstrap a static analysis at a certain point in the program with the dynamic state of the program. Additionally, the user can choose which parts of the dynamic program state should not be copied over to the static analyzer. The tracer tool BinTrace is described in more detail on its project page.

Building and Installation

Please see the README file in the source directory.

Usage

After installation of the standalone package the analyzer can be used from the command line. Use --help to see possible commands or read a more detailed description on this separate CLI section.

Invoked from the command line Bindead currently prints the disassembled code in Intel syntax and in a similar format to objdump plus some analysis summary information, e.g.:

#!text
Analyzed native code (25 instructions):

f:
080483ec:  8b 44 24 04             mov    eax, DWORD PTR [esp+0x4]
080483f0:  c3                      ret    

a:
080483f1:  83 ec 04                sub    esp, 0x4
080483f4:  8b 44 24 08             mov    eax, DWORD PTR [esp+0x8]
080483f8:  89 04 24                mov    DWORD PTR [esp], eax
080483fb:  e8 ec ff ff ff          call   0x80483ec <f>
08048400:  8b 44 24 08             mov    eax, DWORD PTR [esp+0x8]
08048404:  83 c4 04                add    esp, 0x4
08048407:  c3                      ret    

b:
08048408:  83 ec 04                sub    esp, 0x4
0804840b:  8b 44 24 08             mov    eax, DWORD PTR [esp+0x8]
0804840f:  89 04 24                mov    DWORD PTR [esp], eax
08048412:  e8 d5 ff ff ff          call   0x80483ec <f>
08048417:  8b 44 24 08             mov    eax, DWORD PTR [esp+0x8]
0804841b:  83 c4 04                add    esp, 0x4
0804841e:  c3                      ret    

main:
0804841f:  83 ec 14                sub    esp, 0x14
08048422:  c7 44 24 10 00 00 00 00 mov    DWORD PTR [esp+0x10], 0x0
0804842a:  c7 04 24 00 00 00 00    mov    DWORD PTR [esp], 0x0
08048431:  e8 bb ff ff ff          call   0x80483f1 <a>
08048436:  c7 04 24 01 00 00 00    mov    DWORD PTR [esp], 0x1
0804843d:  e8 c6 ff ff ff          call   0x8048408 <b>
08048442:  8b 44 24 10             mov    eax, DWORD PTR [esp+0x10]
08048446:  83 c4 14                add    esp, 0x14
08048449:  c3                      ret    

Analysis steps: 143
Warnings: 0

The same example showing the complete and more verbose analysis trace output (i.e. with the RREIL code and showing register and memory values) can be seen separately on this page.

As can be seen on the above page the trace of the analysis is very verbose, huge and difficult to understand for bigger binaries. Thus, to more conveniently view the inferred values for each program point we built a graphical user interface (GUI) for the analyzer, named p9. It is able to visualize and navigate the CFG of an analyzed executable and show the values inferred by the analyzer at each program point (see the picture below).

Further details about the GUI, its features and installation instructions are on its project page.

Publications

Conference publications/talks regarding Bindead or affiliated projects:

PhD Thesis 15 - Adaptable Static Analysis of Executables for proving the Absence of Vulnerabilities
HES 15 - Sendmail Crackaddr Talk at "Hackito Ergo Sum"
NFM 14 - Synthesizing Predicates from Abstract Domain Losses - (slides of talk)
NFM 13 - Widening as Abstract Domain - (slides of talk)
APLAS 13 - The Undefined Domain: Precise Relational Information for Entities that Do Not Exist - (slides of talk)
APLAS 13 - GDSL: A Universal Toolkit for Giving Semantics to Machine Language
TAPAS 12 - GDSL: A Generic Decoder Specification Language for Interpreting Machine Language
WCRE 11 - Precise Static Analysis of Binaries by Extracting Relational Information - (slides of talk)

Resources

Material mentioned in the text and some additional material:

Wiki

Bindead / Home