Bindead - a static analysis tool for binaries.
Bindead is an analyzer for executable machine code. It features a disassembler that translates machine code bits into an assembler like language (RREIL) that in turn is then analyzed by the static analysis component using abstract interpretation. As Bindead operates on the machine code level, it can be used without having the source code of the program to be analyzed. In fact, the purpose of Bindead is to help with the reverse engineering of executable code or binaries. The analyzer enables one to reason about all the possible runtime behavior of a program and find potential bugs. To this end we perform a collection of (numeric and symbolic) analyses on the program and are able to infer memory access bounds and various other numeric properties statically, i.e. without running the program.
Components of the analyzer:
The disassembler part is designed around the idea of a modular front-end that produces RREIL code from binary code blobs. It uses pluggable front-ends for the input data (file parsing) and the disassemblers/translators.
Binary file formats:
We are able to parse and extract code sections from Linux (ELF) and Windows (PE) executables and our own
RREIL assembler text files. The input file format for binary code chunks though is built around a simple interface and plugins can easily be added for any other file format. Basically any collection of chunks of bytes can be fed to the analyzer if wrapped in our simplified binary file format.
To analyze the code we currently use a hand-written disassembler that translates the binary representation of machine code (chunks of bytes) into the intermediate language RREIL that is then passed on to the analyzer. We can thus use different disassembler front-ends for different architectures with the same code analyzer. Currently we have disassemblers for Intel x86-32 and x86-64 and for the AVR-8 architecture.
As writing front-ends by hand proved to be tedious and error prone and after discussing the issue with people from the reverse engineering community the consensus was that there should be a simpler way to build and share descriptions for machine instructions. Hence, the Generic Decoder Specification Language Toolkit GDSL was started. The project
allows to specify the syntax and semantics of machine instructions in a domain specific language that was
designed for exactly this purpose. GDSL compiles the instructions specification using one of the available language binding back-ends to a disassembler library that can be plugged into other projects. Currently GDSL comes with disassembler specifications for the Intel x86-32 and x86-64 and the AVR-8 architecture, thus replacing the hand written front-ends of the Bindead analyzer.
The static code analyzer together with the disassembler front-ends comprise a recursive descent disassembler. It reconstructs the control flow graph (CFG) of the code while disassembling and analyzing the binary. Contrary to most common disassemblers (e.g.
objdump) that perform a linear sweep over the code, because of the recursive disassembly method and our static analysis we are able to follow the control flow of the code (e.g. indirect jumps and calls). Thus our disassembler usually produces less disassembly errors (wrongly decoded instructions), discovers more code (code vs. data problem) and is more robust when dealing with obfuscated code. However, an in depth comparison with existing disassemblers given all these problem cases has not been performed. The above statement is thus about the theoretical implications of our disassembler architecture and it depends on the performance and precision of the static analyzer component.
Intermediate Language (RREIL)
The static analyzer uses an intermediate language that was designed to be minimal, concise and
suitable for numeric analyses. The language is inspired by the ideas that went into Zynamics' REIL adding revisions for some shortcomings of REIL. For example by exposing
comparisons explicitly the memory bounds analysis can be greatly improved. Hence the name RREIL for Relational REIL where REIL is an acronym for Reverse Engineering Intermediate Language.
More information on the features of the language and its semantics can be found on the RREIL introduction page and in the PhD Thesis about the analyzer or the WCRE 11 paper.
In addition to the disassembler front-ends that translate binary code to RREIL we also have an assembler to RREIL. This allows us to write test examples for the analyzer directly in RREIL without the necessity to first compile code and disassemble it.
The main part of the analyzer is the static analysis part that employs abstract interpretation with a stack of so-called abstract domains (numeric and symbolic) to infer program properties and variable values for each program point. The diagram below gives an overview of the analyzer with an emphasis on the stack of abstract domains. The main point is that the domains operate on an input language and the language becomes simpler for the bottom of the domains stack, e.g. pointers and variable sizes do not exist at the lowest level anymore. The modular design makes it fairly easy to try out new analyses due to the simplification of concerns mentioned above. As the interfaces between the domains are only defined by the input language it is easy to plug in new domains that solve a special and constrained problem.
A short description of the abstract domains that we use can be found on this page. It details the
motivation and use case of each domain. A more detailed description of the domains and how they interoperate can be found in
the PhD Thesis about the analyzer. The thesis and the conference publications
(in the publications section at the bottom) additionally show the problems we faced when analyzing machine code and how
we solved these problems using novel abstract domains.
Dynamic + Static Analysis
As an extension to the static analyzer we built a component that runs a program and outputs a trace of the run
that can then be fed to the analyzer and the abstract domains. The idea behind it is to have a dynamically started static analysis, i.e. bootstrap a static analysis at a certain point in the program with the dynamic state of the program. Additionally, the user can choose which parts of the dynamic program state should not be copied over to the static analyzer. The tracer tool BinTrace is described in more detail on its project page.
Building and Installation
Please see the README file in the source directory.
After installation of the standalone package the analyzer can be used from the command line. Use
--help to see possible commands or read a more detailed description on this separate CLI section.
Invoked from the command line Bindead currently prints the disassembled code in Intel syntax and in a similar format to
objdump plus some analysis summary information, e.g.:
Analyzed native code (25 instructions): f: 080483ec: 8b 44 24 04 mov eax, DWORD PTR [esp+0x4] 080483f0: c3 ret a: 080483f1: 83 ec 04 sub esp, 0x4 080483f4: 8b 44 24 08 mov eax, DWORD PTR [esp+0x8] 080483f8: 89 04 24 mov DWORD PTR [esp], eax 080483fb: e8 ec ff ff ff call 0x80483ec <f> 08048400: 8b 44 24 08 mov eax, DWORD PTR [esp+0x8] 08048404: 83 c4 04 add esp, 0x4 08048407: c3 ret b: 08048408: 83 ec 04 sub esp, 0x4 0804840b: 8b 44 24 08 mov eax, DWORD PTR [esp+0x8] 0804840f: 89 04 24 mov DWORD PTR [esp], eax 08048412: e8 d5 ff ff ff call 0x80483ec <f> 08048417: 8b 44 24 08 mov eax, DWORD PTR [esp+0x8] 0804841b: 83 c4 04 add esp, 0x4 0804841e: c3 ret main: 0804841f: 83 ec 14 sub esp, 0x14 08048422: c7 44 24 10 00 00 00 00 mov DWORD PTR [esp+0x10], 0x0 0804842a: c7 04 24 00 00 00 00 mov DWORD PTR [esp], 0x0 08048431: e8 bb ff ff ff call 0x80483f1 <a> 08048436: c7 04 24 01 00 00 00 mov DWORD PTR [esp], 0x1 0804843d: e8 c6 ff ff ff call 0x8048408 <b> 08048442: 8b 44 24 10 mov eax, DWORD PTR [esp+0x10] 08048446: 83 c4 14 add esp, 0x14 08048449: c3 ret Analysis steps: 143 Warnings: 0
The same example showing the complete and more verbose analysis trace output (i.e. with the RREIL code and showing register and memory values) can be seen separately on this page.
As can be seen on the above page the trace of the analysis is very verbose, huge and difficult to understand for bigger binaries. Thus, to more conveniently view the inferred values for each program point we built a graphical user interface (GUI) for the analyzer, named p9. It is able to visualize and navigate the CFG of an analyzed executable and show the values inferred by the analyzer at each program point (see the picture below).
Further details about the GUI, its features and installation instructions are on its project page.
Conference publications/talks regarding Bindead or affiliated projects:
- PhD Thesis 15 - Adaptable Static Analysis of Executables for proving the Absence of Vulnerabilities
- HES 15 - Sendmail Crackaddr Talk at "Hackito Ergo Sum"
- NFM 14 - Synthesizing Predicates from Abstract Domain Losses - (slides of talk)
- NFM 13 - Widening as Abstract Domain - (slides of talk)
- APLAS 13 - The Undefined Domain: Precise Relational Information for Entities that Do Not Exist - (slides of talk)
- APLAS 13 - GDSL: A Universal Toolkit for Giving Semantics to Machine Language
- TAPAS 12 - GDSL: A Generic Decoder Specification Language for Interpreting Machine Language
- WCRE 11 - Precise Static Analysis of Binaries by Extracting Relational Information - (slides of talk)
Material mentioned in the text and some additional material: