Wiki
Clone wikiinf225 / 2017 / Object Files and Disassembly
Experiments with the C compiler, object files, linking and disassembling
This is a small exercise / tutorial that will show you: * the basics of the command line tools for compiling and linking programs. * how to inspect the code generated by the C and Java compilers, and see the disassembled machine instructions, and how information about programs are stored * how the compiler will translate simple things * the effect of code optimisation
Contents
- Experiments with the C compiler, object files, linking and disassembling
- Compiling and Inspecting Code
- Particular things to try
- Things to try on your own
- Java Object Files
- macOS output
- Output for ARM and AVR architectures
More resources
- Ian Lance Taylor (author of the Gold linker on linkers (in 20 parts): Part 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20
- PLT and GOT - the key to code sharing and dynamic libraries
You need
(If you want to try things for yourself, rather than just reading the text)
- A computer running Linux or Mac OS, or possibly Windows. (On Windows 10, you can also just install all the standard Linux tools from the Windows App store, e.g. Ubuntu on Windows)
- A C compiler and binutils (packages
gcc
andbinutils
in Ubuntu; Xcode from the app store on Mac – if you get invalid active developer path on Mac, you need to install Xcode and runxcode-select --install
from the command line (say yes to installing command line tools)) - An editor, and an open command line / terminal window.
Compiling and Inspecting Code
hello.c
You'll need to create this file on your compiler, using your favourite editor.
:::c #include <stdio.h> int main(int argc, char** argv) { puts("Hello, world!\n"); int x = 42; int y = x * 17; double a = 42.0; double b = a * 17.0; unsigned char m = 42; unsigned short n = m * 17; return argc; }
To compile (on Linux)
Compile to object file
This produces an object file – the compiled code for whatever was in hello.c
, but without the other code need to make it into a complete, runnable program:
gcc -g -c hello.c -o hello.o
cc -g -c hello.c -o hello.o
-g
option means "include debug information", and -c
means "just compile this to an object file, don't link it to make a runnable program".
* These options are the same on all Linux/Unix/Mac C and C++ compilers (typically also for other languages).
* Typically any Unix will let you use cc
as an alias for the C compiler (and c++
or CC
for the C++ compiler); on Linux, the standard C compiler will be GCC, on Mac OS, it's Clang nowadays (and if you're on a fancy supercomputer, the vendor may have a proprietary compiler). Typically the basic options and behaviour (like the things we're doing here) will be the same, but advanced options will be widely different.
* In Java, there's no -c
option to the javac
compiler. Why? However, javac
does have a -g
option for including all debug information (apparently it normally doesn't include information about local variables).
Compile and link to a running program
If you omit -c
, you'll get a runnable program as output. You can combine any number of .c
files and .o
files to produce a program, these will be linked together. If you don't specify the output file, the default name will typically be a.out
(for historic reasons):
:::sh # compile directly cc hello.c -o hello # link object file cc hello.o -o hello # run it ./hello # default name cc hello.c ./a.out
Normally, programs will be dynamically linked against the standard libraries, to avoid duplication and to make it possible to deploy bugfixes and updates for libraries without recompiling everything. It's also possible to specify static linking with the -static
compiler option; this will make the executable file much larger; but it also allows you to see all the code that contributes to running your program. Statically-linked executables may also be more long-term portable, since the standard library versions may change over time (particularly on Linux, where it's assumed that you can just recompile if necessary).
To cross-compile (on Linux)
You can also compile things for other platforms and machine architectures, if you have a cross-compiler. There are a bunch of these available in standard Linux distributions, capable of generating code for many different platforms, including ARM, AVR, Win32/Win64, IBM mainframes, etc. (You can see all variants of GCC in Debian here – this also includes compilers for a lot of different languages. The C cross-compilers will have names starting with gcc-
. If you check the page for the gcc
package you'll see at the bottom that it's available for a list of architectures (you'll normally be running on amd64
(even if you have an Intel processor)). If you just run gcc
, you'll get object files compiled for the platform you're currently running.
Typically, you'll use a cross-compiler when you're developing for a machine that's too small or too inconvenient to use for your development environment – for example, phones or tablets (typically ARM architecture), or embedded systems (e.g., Atmel AVR, Freescale, PIC and many others). For example, the Arduino IDE includes cross compilers for the boards you're developing for – not only wouldn't the compiler fit in the ROM and RAM of a small Arduino, the Harvard-style architecture of the chip won't let you execute data (compiler output) as code.
To look at the output of a cross-compiler, you first need to have one (or several) installed. Package names below refers to what you'd need to install on an Ubuntu/Debian or similar Linux system.
Compiling for ARM
(gcc-arm-none-eabi
and binutils-arm-none-eabi
in Ubuntu/Debian):
arm-none-eabi-gcc -g -c hello.c -o hello-arm.o
Compiling for AVR
(8-bit microcontrollers used in Arduino; you need the packages gcc-avr
and binutils-avr
in Ubuntu/Debian):
avr-gcc -g -c hello.c -o hello-avr.o
To inspect compiled code (on Linux)
To see disassembled code, run objdump -d hello.o
. (or arm-none-eabi-objdump
, avr-objdump
, <architecture-name>-objdump
to inspect code compiled for another architecture.
Particular other options you may be interested in for objdump
include:
-
-t
(or--syms
) Display the contents of the symbol table(s) – this will give you a list of all the names that are referred to in the object file. This will tell you, for example, that the functionmain
is in the.text
segment of the file (this is where program code will be), it's at address 0, and it's 105 bytes long (0x69 in hex), and also whether it's local to the file (l
) or global (g
), so that it can be linked to from other files, as well as a number of other properties. See objdump(1) for more info. There's a similar option-T
for dumping dynamic symbols of a shared library. -
-r
Display the relocation entries in the file – this shows you a list of addresses in the object file that must be relocated before the code can be used. Relocation is used because you normally won't know exactly what address a piece of code will end up at in memory – that depends on what other pieces of code will be loaded (libraries we're using, for example). Typically, this means that relative addresses (e.g., the functionfoo
is at address0x20
in this file, so a call tofoo
will be something likecall 0x20
) will be replaced by absolute addresses (e.g., this file gets loaded at0x8200
, sofoo
will be at0x8220
and I need to update all calls tofoo
to reflect that). Nowadays, code will be [moved around on purpose] even if you might have been able to determine the final memory location at compile time, in order to make security breakins harder. -
-g
(or--debugging
) Display debug information in object file – shows you any debugging information embedded in the file. The compiler adds debug information if you give it the-g
option. This is basically everything you'd need to run the code in a debugger, and let it go step-by-step through the code and display the contents of variables and so on. (You'll probably need specialised knowledge to interpret this.)
Particular things to try
See what dynamic libraries a program uses
On Linux:
$ ldd hello linux-vdso.so.1 => (0x00007ffd1f387000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe0ac91a000) /lib64/ld-linux-x86-64.so.2 (0x000055665097f000) $ ldd /usr/bin/xterm linux-vdso.so.1 => (0x00007fffc49fb000) libXinerama.so.1 => /usr/lib/x86_64-linux-gnu/libXinerama.so.1 (0x00007fe002cf9000) libXft.so.2 => /usr/lib/x86_64-linux-gnu/libXft.so.2 (0x00007fe002ae4000) libfontconfig.so.1 => /usr/lib/x86_64-linux-gnu/libfontconfig.so.1 (0x00007fe0028a1000) libXaw.so.7 => /usr/lib/x86_64-linux-gnu/libXaw.so.7 (0x00007fe00262d000) libXmu.so.6 => /usr/lib/x86_64-linux-gnu/libXmu.so.6 (0x00007fe002414000) libXt.so.6 => /usr/lib/x86_64-linux-gnu/libXt.so.6 (0x00007fe0021a9000) libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007fe001e70000) ...
ldd
into running the program it's inspecting, so don't use it on things you don't trust. You can get some of the same information with objdump -p file | grep NEEDED
.
Inspect symbol table
Try objdump -t
:
$ objdump -t hello.o hello.o: file format elf64-x86-64 SYMBOL TABLE: 0000000000000000 l df *ABS* 0000000000000000 hello.c 0000000000000000 l d .text 0000000000000000 .text 0000000000000000 l d .data 0000000000000000 .data 0000000000000000 l d .bss 0000000000000000 .bss 0000000000000000 l d .rodata 0000000000000000 .rodata 0000000000000000 l d .debug_info 0000000000000000 .debug_info 0000000000000000 l d .debug_abbrev 0000000000000000 .debug_abbrev 0000000000000000 l d .debug_aranges 0000000000000000 .debug_aranges 0000000000000000 l d .debug_line 0000000000000000 .debug_line 0000000000000000 l d .debug_str 0000000000000000 .debug_str 0000000000000000 l d .note.GNU-stack 0000000000000000 .note.GNU-stack 0000000000000000 l d .eh_frame 0000000000000000 .eh_frame 0000000000000000 l d .comment 0000000000000000 .comment 0000000000000000 g F .text 0000000000000069 main 0000000000000000 *UND* 0000000000000000 _GLOBAL_OFFSET_TABLE_ 0000000000000000 *UND* 0000000000000000 puts
-
Here the first number is the symbol's value (sometimes refered to as its address). The next field is actually a set of characters and spaces indicating the flag bits that are set on the symbol. These characters are described below. Next is the section with which the symbol is associated or ABS if the section is absolute (ie not connected with any section), or UND if the section is referenced in the file being dumped, but not defined there.
-
After the section name comes another field, a number, which for common symbols is the alignment and for other symbol is the size. Finally the symbol's name is displayed.
Notice the following things:
- There are four main sections (or segments) in an object file: text, which contains program code, data and rodata which contains associated program data (such as initialised globals, constants, strings, etc), and bss which contains uninitialised data (such as space for global variables that are not constants, or that are initialised by a constructor call). Above, there's also sections for various debug information.
- The text section contains one interesting symbol:
0000000000000000 g F .text 0000000000000069 main
- This is the
main
function of the C program. The first number is its address within the section (this will be relocated to an actual memory address when it's linked or loaded into memory), theg
indicates that it's a global symbol (can be accessed from other objects), theF
indicates that it's a function (so it's sensible to call it),.text
is the section name, and69
is the length (105 bytes). - There are also to interesting symbols marked
*UND*
(undefined):0000000000000000 *UND* 0000000000000000 _GLOBAL_OFFSET_TABLE_ 0000000000000000 *UND* 0000000000000000 puts
- This is why the
hello.o
can't be run as a program – some pieces of code aren't there yet.puts
is a function from that standard C library, which is needed for printing "Hello, world!". It'll end up being provided by the librarylibc.so.6
which we saw listed in the output ofldd
. If you look at the output ofobjdump -t hello
, you'll see that it's different in the full program:0000000000000000 F *UND* 0000000000000000 puts@@GLIBC_2.2.5
puts
is still undefined, but it now knows that it's found in versionGLIBC_2.2.5
oflibc.6.so
(there'll be information about shared libraries in the.dynamic
section of the file).- The
_GLOBAL_OFFSET_TABLE_
is used by Linux to deal with shared libraries, and sharing the same library code across multiple processes. You can read about it here. It'll be provided by the compiler when you produce the executable program:0000000000200fb0 l O .got 0000000000000000 _GLOBAL_OFFSET_TABLE_
Disassembling code
This will show you the assembly language instructions the compiler has chosen when it compiled your C code.
Use objdump -d <filename>
to disassemble all code in a file – or, even better, objdump -d -S <filename>
to see the C source together with the assembler code, or objdump -d -l <filename>
to see the line numbers from the original C file (if you compiled with the -g
option).
Note that objdump
by default outputs x86/x86_64 assembly code in AT&T syntax, rather than Intel syntax, which may confuse you if you're used to the Intel syntax. For example, AT&T syntax has the target operand on the right-hand side and Intel has it on the left-hand side, so mov %rsp,%rbp
(AT&T syntax) corresponds to mov rbp,rsp
(Intel syntax). Select Intel syntax by adding the option -M intel
(Linux) or -x86-asm-syntax=intel
(MacOS).
Here's the full disassembly of hello.o
, so you can study it for yourself. See below for a guided tour.
:::objdump $ objdump -d -l hello.o hello.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000 <main>: main(): /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:3 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 48 83 ec 30 sub $0x30,%rsp 8: 89 7d dc mov %edi,-0x24(%rbp) b: 48 89 75 d0 mov %rsi,-0x30(%rbp) /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:4 f: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 16 <main+0x16> 16: e8 00 00 00 00 callq 1b <main+0x1b> /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:5 1b: c7 45 e4 2a 00 00 00 movl $0x2a,-0x1c(%rbp) /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:6 22: 8b 55 e4 mov -0x1c(%rbp),%edx 25: 89 d0 mov %edx,%eax 27: c1 e0 04 shl $0x4,%eax 2a: 01 d0 add %edx,%eax 2c: 89 45 e8 mov %eax,-0x18(%rbp) /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:8 2f: f2 0f 10 05 00 00 00 movsd 0x0(%rip),%xmm0 # 37 <main+0x37> 36: 00 37: f2 0f 11 45 f0 movsd %xmm0,-0x10(%rbp) /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:9 3c: f2 0f 10 4d f0 movsd -0x10(%rbp),%xmm1 41: f2 0f 10 05 00 00 00 movsd 0x0(%rip),%xmm0 # 49 <main+0x49> 48: 00 49: f2 0f 59 c1 mulsd %xmm1,%xmm0 4d: f2 0f 11 45 f8 movsd %xmm0,-0x8(%rbp) /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:11 52: c6 45 e3 2a movb $0x2a,-0x1d(%rbp) /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:12 56: 0f b6 55 e3 movzbl -0x1d(%rbp),%edx 5a: 89 d0 mov %edx,%eax 5c: c1 e0 04 shl $0x4,%eax 5f: 01 d0 add %edx,%eax 61: 89 45 ec mov %eax,-0x14(%rbp) /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:14 64: 8b 45 dc mov -0x24(%rbp),%eax /home/anya/git/inf225/inf225.h17/src/object-files/hello.c:15 67: c9 leaveq 68: c3 retq
Guided tour of hello.o
Function preamble
This code sets up the stack frame and stores arguments in local variables and so on. The exact code will depend on the calling conventions for the platform, which is specified in the Application Binary Interface (ABI). This code uses the System V AMD64 ABI, which is used by Linux, macOS, FreeBSD and other Unix-like systems for 64-bit PCs. Windows uses a different scheme, and the conventions will be different on different architectures (such as ARM, or 32-bit x86).
:::c int main(int argc, char** argv) {
:::objdump 0: 55 push %rbp
:::objdump 1: 48 89 e5 mov %rsp,%rbp
:::objdump 4: 48 83 ec 30 sub $0x30,%rsp
int
. We store it at offset -0x24 in the stack frame – this is the variable int argc
.
:::objdump 8: 89 7d dc mov %edi,-0x24(%rbp)
char** argc
.
:::objdump b: 48 89 75 d0 mov %rsi,-0x30(%rbp)
Stack frame layout
With a bit of effort, you can figure out where the variables are in the stack frames (or even in registers) from the debugging info included in the object file (though it's likely easier to just guess):
- Run
objdump -g hello.o
(not sure how to do this on Mac) - At the bottom, in the
.eh_frame
section, there's information on the stack frame offset (DW_CFA_def...); this can vary for different parts of the program, for example before we domov %rsp,%rbp
(this is set up with DW_CFA_advance_loc). In our case, the CFA (Canonical Frame Address) for most of themain
function is RBP-16, which means that the variable offsets in the debug info should be adjusted by 16 to find the offsets used when accessing the stack frame through RBP. - Close to the top, you'll find:
<2><324>: Abbrev Number: 19 (DW_TAG_formal_parameter) <325> DW_AT_name : (indirect string, offset: 0xf6): argc <329> DW_AT_decl_file : 1 <32a> DW_AT_decl_line : 3 <32b> DW_AT_type : <0x62> <32f> DW_AT_location : 2 byte block: 91 4c (DW_OP_fbreg: -52)
- This says that the variable
argc
is defined at line 3, has type 0x62 (which isint
, there's an entry for this as well), and it's location is at offset -52 from the stack frame register. - So, we take -52, adjust it by 16, so we get -36, which is -0x24 in hex – so the variable
argc
can be found at-0x24(%rbp)
.
Addresses, relocation and function calls
:::c puts("Hello, world!\n");
:::objdump f: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 16 <main+0x16> 16: e8 00 00 00 00 callq 1b <main+0x1b>
argc
in the lower-half of RDI), and then calling the puts
function.
We can't see this, however, without more information. The addresses used in both instructions are 0 (00 00 00 00
in the binary code) – these are interpreted relative to the program counter (RIP), so the string address is 0x16 (the first byte after the lea
instruction), and the function address is 0x1b (first byte after the call instruction) – which is nonsense.
Both instructions refer to things that are defined in different sections (and, in the case of puts
in a completely different file that we might not see until the program is executed). The linker will fill in the correct values at a later point, by looking at the relocation table finding the correct location of the relevant symbol, and then writing that back into the instruction.
We can see this information by adding the -r
option to objdump
, this gives us:
:::objdump f: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 16 <main+0x16> 12: R_X86_64_PC32 .rodata-0x4 16: e8 00 00 00 00 callq 1b <main+0x1b> 17: R_X86_64_PLT32 puts-0x4
.rodata
-4, which will end up being the first byte of our read-only data area. (The compiler has subtracted 4, because the address is looked up relative to program counter (R_X86_64_PC32 – it's a PC-relative 32-bit address); the relocation is done at address 12, but the CPU will be at address 16 when it does the computation, hence -4).
If we look at .rodata
(objdump -s -j .rodata hello.o
), it looks like this:
Contents of section .rodata: 0000 48656c6c 6f2c2077 6f726c64 21000000 Hello, world!... 0010 00000000 00004540 00000000 00003140 ......E@......1@
The callq
instruction will also do PC-relative addressing (basically, it does something like push %rip; add $FUN_ADDRESS, %rip
), so the same principle applies (the linker will fill in puts-4
).
Figuring out relocated addresses
We can check what the relocation table looks like using objdump -r
(-j .text
gives us only the part we're interested in):
$ objdump -r -j .text hello.o hello.o: file format elf64-x86-64 RELOCATION RECORDS FOR [.text]: OFFSET TYPE VALUE 0000000000000012 R_X86_64_PC32 .rodata-0x0000000000000004 0000000000000017 R_X86_64_PLT32 puts-0x0000000000000004 0000000000000033 R_X86_64_PC32 .rodata+0x000000000000000c 0000000000000045 R_X86_64_PC32 .rodata+0x0000000000000014
- address = (.rodata-0x04)-(.text+0x12)
The CPU will then calculate the address as start of next instruction (=.text+0x16) + address. (I.e., .text+0x16 + (.rodata-0x04) - (.text+0x12) = 0x04 + .rodata - 0x04 = .rodata, which is where the string is stored)
You can see this in action if you look at the hello
program, which has been relocated and linked:
6af: 48 8d 3d e2 00 00 00 lea 0xe2(%rip),%rdi
main
function starts at 0x6a0 (before main
, there are a bunch of other functions that are added by the C runtime, including _start
, which is the actual entry point of the program). The .rodata
of hello
looks like this:
$ objdump -s -j .rodata hello hello: file format elf64-x86-64 Contents of section .rodata: 0790 01000200 00000000 48656c6c 6f2c2077 ........Hello, w 07a0 6f726c64 21000000 00000000 00004540 orld!.........E@ 07b0 00000000 00003140 ......1@
lea
instruction the CPU will
- Start at 0x6af, read the seven bytes of the instruction (including the 4 bytes of the address offset) – leaving the program counter (%rip) at 0x6b6.
- Compute the address as 0xe2 + %rip = 0xe2 + 0x6b6 = 0x798.
- Store the address in %rdi, where
puts
will expect to find its string argument. - Continue executing at 0x6b6, which is the call to
puts
.
Library calls
The call to puts
could be handled in the same way, and it would have been if it came as part of our program or a statically linked library. In this case, however, puts
is found in the GNU C Library, which is a dynamically linked shared library, so we won't know the addresses before the program is loaded. To make it possible to share the same program code between several running processes (Linux will do this automatically) and avoid swapping executable code out to disk if memory is tight (much better to just discard the code from memory, and read it again from the program file if necessary), the code isn't actually subjected to relocation when it's loaded – it's just loaded unchanged from disk, without substituting addresses. For this to work, addresses must be decide on in advance, and anything we can't decide in advance must be looked up in a table at run time. This is the global offset table (GOT) mentioned earlier, and it works with the help of a Procedure Linkage Table (PLT).
Let's look at the call instruction again, after it's been linked. Instead of
:::objdump 16: e8 00 00 00 00 callq 1b <main+0x1b>
hello
executable file contains:
:::objdump 6b6: e8 a5 fe ff ff callq 560 <.plt.got>
a5 fe ff ff
is 0xfffffea5, which is -0x15b, and 0x6b6+5-0x15b=0x560)
If we asked for the full disassembly with objdump -d hello
, we'll also find the address 0x560, and it's in the .plt.got
section:
:::objdump 0000000000000560 <.plt.got>: 560: ff 25 6a 0a 20 00 jmpq *0x200a6a(%rip) # 200fd0 <puts@GLIBC_2.2.5> 566: 66 90 xchg %ax,%ax 568: ff 25 8a 0a 20 00 jmpq *0x200a8a(%rip) # 200ff8 <__cxa_finalize@GLIBC_2.2.5> 56e: 66 90 xchg %ax,%ax
*
is pointer dereferencing: load the target address from 0x200a6a(%rip), rather than jumping to 0x200a6a(%rip); in Intel-style assembler, this is written as QWORD PTR [rip+0x200a6a]
.) You can ignore the xchg
instruction; it's just a filler do make sure the table entries are neatly aligned to 8 bytes.
0x200fd0 is an address in the .got
section:
$ objdump -s -j .got hello hello: file format elf64-x86-64 Contents of section .got: 200fb0 f00d2000 00000000 00000000 00000000 .. ............. 200fc0 00000000 00000000 00000000 00000000 ................ 200fd0 00000000 00000000 00000000 00000000 ................ 200fe0 00000000 00000000 00000000 00000000 ................ 200ff0 00000000 00000000 00000000 00000000 ................
$ objdump -R hello
hello: file format elf64-x86-64
DYNAMIC RELOCATION RECORDS
OFFSET TYPE VALUE
0000000000200dd8 R_X86_64_RELATIVE *ABS*+0x0000000000000670
0000000000200de0 R_X86_64_RELATIVE *ABS*+0x0000000000000630
0000000000201008 R_X86_64_RELATIVE *ABS*+0x0000000000201008
0000000000200fc8 R_X86_64_GLOB_DAT _ITM_deregisterTMCloneTable
0000000000200fd0 R_X86_64_GLOB_DAT puts@GLIBC_2.2.5
0000000000200fd8 R_X86_64_GLOB_DAT __libc_start_main@GLIBC_2.2.5
0000000000200fe0 R_X86_64_GLOB_DAT __gmon_start__
0000000000200fe8 R_X86_64_GLOB_DAT _Jv_RegisterClasses
0000000000200ff0 R_X86_64_GLOB_DAT _ITM_registerTMCloneTable
0000000000200ff8 R_X86_64_GLOB_DAT __cxa_finalize@GLIBC_2.2.5
puts
function.
Arithmetic operations
The rest of the main
function is much less complicated.
Integers
Variables are strictly speaking not declared in the compiled code; we just have an area set aside for them on the stack. But C variable declarations with an initialiser will end up as code assigning the initial value to the variable:
:::c int x = 42;
:::objdump 1b: c7 45 e4 2a 00 00 00 movl $0x2a,-0x1c(%rbp) # x = 42
x
is located at offset -0x1c in the stack frame (consulting the debug info should let us verify this).
The code for integer multiplication is somewhat interesting:
:::c int y = x * 17;
:::objdump 22: 8b 55 e4 mov -0x1c(%rbp),%edx # %edx = x 25: 89 d0 mov %edx,%eax # %eax = %edx 27: c1 e0 04 shl $0x4,%eax # %eax = %eax << 4 2a: 01 d0 add %edx,%eax # %eax = %eax + %edx 2c: 89 45 e8 mov %eax,-0x18(%rbp) # y = %eax
Doubles
:::c double a = 42.0;
:::objdump 2f: f2 0f 10 05 00 00 00 movsd 0x0(%rip),%xmm0 # 37 <main+0x37> 36: 00 37: f2 0f 11 45 f0 movsd %xmm0,-0x10(%rbp)
.rodata
, you'll find the double value at offset 0x10:
0000 48656c6c 6f2c2077 6f726c64 21000000 Hello, world!... 0010 00000000 00004540 00000000 00003140 ......E@......1@
00000000 00004540
is the little-endian hex representation of the IEEE 754 64-bit double-precision floating point number 42 (there's a handy converter that lets you find the hex representation of floats.
(Probably, the value is moved via the %xmm0 register because there is no memory-to-memory move instruction for floats. I think.)
The multiplication itself is very easy, and uses a multiply instruction rather than shifts and adds:
:::objdump 3c: f2 0f 10 4d f0 movsd -0x10(%rbp),%xmm1 # read a 41: f2 0f 10 05 00 00 00 movsd 0x0(%rip),%xmm0 # read 17.0 48: 00 49: f2 0f 59 c1 mulsd %xmm1,%xmm0 # a*17.0 4d: f2 0f 11 45 f8 movsd %xmm0,-0x8(%rbp) # b = a*17.0
8/16-bit values
Finally, we have the same computation on 8-bit unsigned char
s, with the result stored in a 16-bit unsigned short
:
:::objdump unsigned char m = 42; 52: c6 45 e5 2a movb $0x2a,-0x1b(%rbp) unsigned short n = m * 17; 56: 0f b6 55 e5 movzbl -0x1b(%rbp),%edx 5a: 89 d0 mov %edx,%eax 5c: c1 e0 04 shl $0x4,%eax 5f: 01 d0 add %edx,%eax 61: 66 89 45 e6 mov %ax,-0x1a(%rbp)
movb
for a byte, mov %ax
for a 16-bit word). But notice that the actual computation happens in the %eax register, which is 32 bits – most likely, a modern processor will be faster doing a 32 or 64 bit arithmetic operation, rather than working with smaller pieces of data (most likely, there's no circuitry for reading, writing and calculating with 8 or 16 bit values in the processor; instead, things are done in 32 or 64 bits internally, and then the extra bits are discarded).
Returning from a function
Finally, we have the code for cleaning up and returning to the caller. The ABI states that int
return values should be stored in the EAX register:
:::objdump return argc; 65: 8b 45 dc mov -0x24(%rbp),%eax
leave
(the q
means it works on 64-bit pointers) instruction will do this:
:::objdump 68: c9 leaveq
:::gas mov %rbp,%rsp # our stack frame pointer will be the caller's stack pointer pop %rbp # pop caller's frame pointer from the stack
enter
instruction, that replicates the functionality of the first push
, mov
and sub
from the start of the function – this is apparently not used by GCC.)
If we had used other non-scratch registers, we would now also have to restore them to their previous values from the stack.
Finally, we return to the caller:
:::objdump 69: c3 retq
call
that called main.
Things to try on your own
- Change the C code in
hello.c
, recompile and see what the difference is. For example:- Try different types (
long
,long long
,short
,float
, ...) and see how the instructions and stack frame addresses change - Try different arithmetic operations; also try multiplying by a non-constant factor.
- Try different types (
- Try turning on the optimizer (
cc -O -c -g hello.c -o hello.o
– GCC will accept various optimisation levels,-O1
,-O2
,-O3
,-Os
(optimise for size) etc).- Quite likely, all your code will disappear – that's because the optimiser sees that you're not actually using the values you're computing. You can try to work around this by printing the values before your return (
printf("%d %f %d\n", y, b, n);
) – that also doesn't help, because the compiler will just precompute the constant expressions. - To see optimized, code, you'll have to trick the compiler so it can't know the values in advance, e.g. with a different file
calc.c
::::c #include <stdio.h> int cal(int x, int f, double a, unsigned char m) { puts("Hello, world!\n"); int y = x * f; double b = a * f; unsigned short n = m * (unsigned short)f; printf("%d %f %d\n", y, b, n); return y; }
- Quite likely, all your code will disappear – that's because the optimiser sees that you're not actually using the values you're computing. You can try to work around this by printing the values before your return (
You can try a few variants, including using 17
instead of f
(the compiler might switch back to using shift+add, except when optimising for size (-Os
), where it might try to use fewer instructions even if they're slower).
Java Object Files
The Java .class
files are also object files, containing bytecode for the Java Virtual Machine (JVM). The file format is quite different from the ELF files used for native code on Unix-like systems.
You can examine Java class files with the javap
command. Here's an example that's very similar to our running example:
:::java public class Hello { public static void main(String[] args) { System.out.println("Hello, world!%n"); int x = 42; int y = x * 17; double a = 42.0; double b = a * 17.0; byte m = 42; short n = (short)(m * 17); } }
Compile it with javac -g Hello.java
. The -g
option adds debug information about local variables. (No -c
necessary with the Java compiler, it always produces stand-alone classes, and any "linking" happens when the class is loaded into the JVM. The comparable thing to linking a bunch of .o
files into an executable program would be to bundle up a bunch of classes in a .jar
files with the jar
command.)
Look at the JVM code using javap
:
$ javap -constants -l -s -c Hello.class :::jvm Compiled from "Hello.java" public class Hello { public Hello(); descriptor: ()V Code: 0: aload_0 1: invokespecial #1 // Method java/lang/Object."<init>":()V 4: return LineNumberTable: line 2: 0 LocalVariableTable: Start Length Slot Name Signature 0 5 0 this LHello; public static void main(java.lang.String[]); descriptor: ([Ljava/lang/String;)V Code: 0: getstatic #2 // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3 // String Hello, world!%n 5: invokevirtual #4 // Method java/io/PrintStream.println:(Ljava/lang/String;)V 8: bipush 42 10: istore_1 11: iload_1 12: bipush 17 14: imul 15: istore_2 16: ldc2_w #5 // double 42.0d 19: dstore_3 20: dload_3 21: ldc2_w #7 // double 17.0d 24: dmul 25: dstore 5 27: bipush 42 29: istore 7 31: iload 7 33: bipush 17 35: imul 36: i2s 37: istore 8 39: return LineNumberTable: line 4: 0 line 5: 8 line 6: 11 line 8: 16 line 9: 20 line 11: 27 line 12: 31 line 13: 39 LocalVariableTable: Start Length Slot Name Signature 0 40 0 args [Ljava/lang/String; 11 29 1 x I 16 24 2 y I 20 20 3 a D 27 13 5 b D 31 9 7 m B 39 1 8 n S }
main
is ([Ljava/lang/String;)V
: a method that takes an array of strings and returns void
. A similar scheme called name mangling is used by languages such as C++ to encode overload information into strings in a way compatible with C.
* Local variables are stored in slots on the stack; so dstore_3
will pop a double from the stack and store the it in a
(slot 3), and dload_3
will push the contents of a
onto the stack.
* You'll notice some unnecessary code – for instance istore_1
directly followed by iload_1
, which is pretty much the same as we had in the x86 code above (e.g., movl $0x2a,-0x1c(%rbp)
, mov -0x1c(%rbp),%edx
), without anyone reading from x / slot 1 / -0x1c(%rbp)
afterwards).
* The C compiler will remove unnecessary local variables if you tell it to optimise; the standard Java compiler never optimises the code.
* A JVM with JIT support (such as the standard HotSpot virtual machine) will however do such optimisations on the fly.
* Optimisation typically interferes with debugging – both because the code that gets run can be radically different from the source code the programmer is looking at, and because important local variables may have been optimised away.
macOS output
Disassembly
Notice that:
- The ABI is the same as on Linux; stack frame is in %rbp, the arguments arrive in %edi and %rsi, the return value is placed in %eax.
- The code is generated by Clang, which is also available on Linux, and it chooses different instructions from GCC:
addq
andpopq
to restore the stack before returning instead ofleaveq
– the effect is the same.imull
to do 32-bit signed integer multiplication, instead of shift/add- The instruction order is also a bit different, with the floating point move instructions placed before the integer multiplication – possibly so that the relatively slow memory access will proceed while the processor is busy computing stuff.
- Clang on Linux will produce similar output.
:::objdump $ objdump -line-numbers -d hello.o hello.o: file format Mach-O 64-bit x86-64 Disassembly of section __TEXT,__text: _main: 0: 55 pushq %rbp 1: 48 89 e5 movq %rsp, %rbp 4: 48 83 ec 40 subq $64, %rsp 8: 48 8d 05 71 00 00 00 leaq 113(%rip), %rax f: c7 45 fc 00 00 00 00 movl $0, -4(%rbp) 16: 89 7d f8 movl %edi, -8(%rbp) 19: 48 89 75 f0 movq %rsi, -16(%rbp) 1d: 48 89 c7 movq %rax, %rdi 20: b0 00 movb $0, %al 22: e8 00 00 00 00 callq 0 <_main+0x27> 27: f2 0f 10 05 41 00 00 00 movsd 65(%rip), %xmm0 2f: f2 0f 10 0d 41 00 00 00 movsd 65(%rip), %xmm1 37: c7 45 ec 2a 00 00 00 movl $42, -20(%rbp) 3e: 6b 4d ec 11 imull $17, -20(%rbp), %ecx 42: 89 4d e8 movl %ecx, -24(%rbp) 45: f2 0f 11 4d e0 movsd %xmm1, -32(%rbp) 4a: f2 0f 59 45 e0 mulsd -32(%rbp), %xmm0 4f: f2 0f 11 45 d8 movsd %xmm0, -40(%rbp) 54: c6 45 d7 2a movb $42, -41(%rbp) 58: 0f b6 4d d7 movzbl -41(%rbp), %ecx 5c: 6b c9 11 imull $17, %ecx, %ecx 5f: 89 4d d0 movl %ecx, -48(%rbp) 62: 8b 4d f8 movl -8(%rbp), %ecx 65: 89 45 cc movl %eax, -52(%rbp) 68: 89 c8 movl %ecx, %eax 6a: 48 83 c4 40 addq $64, %rsp 6e: 5d popq %rbp 6f: c3 retq hello.o: file format Mach-O 64-bit x86-64 SYMBOL TABLE: 0000000000000000 g F __TEXT,__text _main 0000000000000000 *UND* _printf
Inspecting dynamic libraries on macOS
$ objdump -macho -dylibs-used ./hello hello: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)
$ objdump -macho -dylibs-used /bin/ls /bin/ls: /usr/lib/libutil.dylib (compatibility version 1.0.0, current version 1.0.0) /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)
Output for ARM and AVR architectures
ARM Disassembly of hello.o
:
Notice the following:
- All the instructions are the same length (4 bytes / 32 bits), unlike x86/amd64 code which can vary widely in size.
- The ARM code also uses shift and add to do integer multiplication by a constant (x << 4 multiplies by 2^4 = 16; add x again to multiply by 17). This may be faster than generic multiplication.
- The ARM code below does a library call to do floating point multiplication, and doubles are stored in two normal 32-bit registers. If we had compiled for a more advanced ARM processor, it would probably have builtin floating point operations.
- All the instructions that process data uses registers, and only the store and load instructions access memory – you have to move stuff from memory/stack into a register before computing with a value. This is typical of a RISC architecture
:::objdump $ arm-none-eabi-objdump -dSr hello-arm.o hello-arm.o: file format elf32-littlearm Disassembly of section .text: 00000000 <main>: #include <stdio.h> int main(int argc, char** argv) { 0: e92d4810 push {r4, fp, lr} 4: e28db008 add fp, sp, #8 8: e24dd02c sub sp, sp, #44 ; 0x2c c: e50b0030 str r0, [fp, #-48] ; 0xffffffd0 10: e50b1034 str r1, [fp, #-52] ; 0xffffffcc puts("Hello, world!\n"); 14: e59f0088 ldr r0, [pc, #136] ; a4 <main+0xa4> 18: ebfffffe bl 0 <puts> 18: R_ARM_CALL puts int x = 42; 1c: e3a0302a mov r3, #42 ; 0x2a 20: e50b3010 str r3, [fp, #-16] int y = x * 17; 24: e51b2010 ldr r2, [fp, #-16] 28: e1a03002 mov r3, r2 2c: e1a03203 lsl r3, r3, #4 30: e0833002 add r3, r3, r2 34: e50b3014 str r3, [fp, #-20] ; 0xffffffec double a = 42.0; 38: e3a03000 mov r3, #0 3c: e59f4064 ldr r4, [pc, #100] ; a8 <main+0xa8> 40: e50b301c str r3, [fp, #-28] ; 0xffffffe4 44: e50b4018 str r4, [fp, #-24] ; 0xffffffe8 double b = a * 17.0; 48: e3a02000 mov r2, #0 4c: e59f3058 ldr r3, [pc, #88] ; ac <main+0xac> 50: e24b101c sub r1, fp, #28 54: e8910003 ldm r1, {r0, r1} 58: ebfffffe bl 0 <__aeabi_dmul> 58: R_ARM_CALL __aeabi_dmul 5c: e1a03000 mov r3, r0 60: e1a04001 mov r4, r1 64: e50b3024 str r3, [fp, #-36] ; 0xffffffdc 68: e50b4020 str r4, [fp, #-32] ; 0xffffffe0 unsigned char m = 42; 6c: e3a0302a mov r3, #42 ; 0x2a 70: e54b3025 strb r3, [fp, #-37] ; 0xffffffdb unsigned short n = m * 17; 74: e55b3025 ldrb r3, [fp, #-37] ; 0xffffffdb 78: e1a03803 lsl r3, r3, #16 7c: e1a03823 lsr r3, r3, #16 80: e1a02003 mov r2, r3 84: e1a02202 lsl r2, r2, #4 88: e0823003 add r3, r2, r3 8c: e14b32b8 strh r3, [fp, #-40] ; 0xffffffd8 return argc; 90: e51b3030 ldr r3, [fp, #-48] ; 0xffffffd0 } 94: e1a00003 mov r0, r3 98: e24bd008 sub sp, fp, #8 9c: e8bd4810 pop {r4, fp, lr} a0: e12fff1e bx lr a0: R_ARM_V4BX *ABS* a4: 00000000 .word 0x00000000 a4: R_ARM_ABS32 .rodata a8: 40450000 .word 0x40450000 ac: 40310000 .word 0x40310000
(We'll need to have the relevant ARM libraries installed as well to compile and inspect the executable program.)
AVR Disassembly of hello.o
:
Note:
- All instructions are two bytes long
- As in the ARM code, there's a lot of loads and stores
ldd
,ldi
,std
, although some of the instructions take small integers as immediate arguments. - Multiple registers are needed to hold values; this is an 8-bit processors, so registers are only 8-bit wide. The
int
data type is 16 bits. For example, the integer 42 is loaded by putting 0x2a into r24 and 0x00 into r25, and the double 42.0 needs four registers. - Integer multiplication is quite complicated, since the processor can only deal with 8 bits at a time.
- If you compile to an executable program (i.e., something that might be uploaded to the flash memory on an Arduino), you'll notice that everything is statically linked, so the code will also include the
puts
function and the floating-point multiplication function.:::objdump $ avr-objdump -dSr hello-avr.o hello-avr.o: file format elf32-avr Disassembly of section .text: 00000000 <main>: #include <stdio.h> int main(int argc, char** argv) { 0: cf 93 push r28 2: df 93 push r29 4: cd b7 in r28, 0x3d ; 61 6: de b7 in r29, 0x3e ; 62 8: 63 97 sbiw r28, 0x13 ; 19 a: 0f b6 in r0, 0x3f ; 63 c: f8 94 cli e: de bf out 0x3e, r29 ; 62 10: 0f be out 0x3f, r0 ; 63 12: cd bf out 0x3d, r28 ; 61 14: 99 8b std Y+17, r25 ; 0x11 16: 88 8b std Y+16, r24 ; 0x10 18: 7b 8b std Y+19, r23 ; 0x13 1a: 6a 8b std Y+18, r22 ; 0x12 puts("Hello, world!\n"); 1c: 80 e0 ldi r24, 0x00 ; 0 1c: R_AVR_LO8_LDI .rodata 1e: 90 e0 ldi r25, 0x00 ; 0 1e: R_AVR_HI8_LDI .rodata 20: 00 d0 rcall .+0 ; 0x22 <main+0x22> 20: R_AVR_13_PCREL puts int x = 42; 22: 8a e2 ldi r24, 0x2A ; 42 24: 90 e0 ldi r25, 0x00 ; 0 26: 9a 83 std Y+2, r25 ; 0x02 28: 89 83 std Y+1, r24 ; 0x01 int y = x * 17; 2a: 29 81 ldd r18, Y+1 ; 0x01 2c: 3a 81 ldd r19, Y+2 ; 0x02 2e: 82 2f mov r24, r18 30: 93 2f mov r25, r19 32: 82 95 swap r24 34: 92 95 swap r25 36: 90 7f andi r25, 0xF0 ; 240 38: 98 27 eor r25, r24 3a: 80 7f andi r24, 0xF0 ; 240 3c: 98 27 eor r25, r24 3e: 82 0f add r24, r18 40: 93 1f adc r25, r19 42: 9c 83 std Y+4, r25 ; 0x04 44: 8b 83 std Y+3, r24 ; 0x03 double a = 42.0; 46: 80 e0 ldi r24, 0x00 ; 0 48: 90 e0 ldi r25, 0x00 ; 0 4a: a8 e2 ldi r26, 0x28 ; 40 4c: b2 e4 ldi r27, 0x42 ; 66 4e: 8d 83 std Y+5, r24 ; 0x05 50: 9e 83 std Y+6, r25 ; 0x06 52: af 83 std Y+7, r26 ; 0x07 54: b8 87 std Y+8, r27 ; 0x08 double b = a * 17.0; 56: 20 e0 ldi r18, 0x00 ; 0 58: 30 e0 ldi r19, 0x00 ; 0 5a: 48 e8 ldi r20, 0x88 ; 136 5c: 51 e4 ldi r21, 0x41 ; 65 5e: 6d 81 ldd r22, Y+5 ; 0x05 60: 7e 81 ldd r23, Y+6 ; 0x06 62: 8f 81 ldd r24, Y+7 ; 0x07 64: 98 85 ldd r25, Y+8 ; 0x08 66: 00 d0 rcall .+0 ; 0x68 <main+0x68> 66: R_AVR_13_PCREL __mulsf3 68: b9 2f mov r27, r25 6a: a8 2f mov r26, r24 6c: 97 2f mov r25, r23 6e: 86 2f mov r24, r22 70: 89 87 std Y+9, r24 ; 0x09 72: 9a 87 std Y+10, r25 ; 0x0a 74: ab 87 std Y+11, r26 ; 0x0b 76: bc 87 std Y+12, r27 ; 0x0c unsigned char m = 42; 78: 8a e2 ldi r24, 0x2A ; 42 7a: 8d 87 std Y+13, r24 ; 0x0d unsigned short n = m * 17; 7c: 8d 85 ldd r24, Y+13 ; 0x0d 7e: 28 2f mov r18, r24 80: 30 e0 ldi r19, 0x00 ; 0 82: 82 2f mov r24, r18 84: 93 2f mov r25, r19 86: 82 95 swap r24 88: 92 95 swap r25 8a: 90 7f andi r25, 0xF0 ; 240 8c: 98 27 eor r25, r24 8e: 80 7f andi r24, 0xF0 ; 240 90: 98 27 eor r25, r24 92: 82 0f add r24, r18 94: 93 1f adc r25, r19 96: 9f 87 std Y+15, r25 ; 0x0f 98: 8e 87 std Y+14, r24 ; 0x0e return argc; 9a: 88 89 ldd r24, Y+16 ; 0x10 9c: 99 89 ldd r25, Y+17 ; 0x11 } 9e: 63 96 adiw r28, 0x13 ; 19 a0: 0f b6 in r0, 0x3f ; 63 a2: f8 94 cli a4: de bf out 0x3e, r29 ; 62 a6: 0f be out 0x3f, r0 ; 63 a8: cd bf out 0x3d, r28 ; 61 aa: df 91 pop r29 ac: cf 91 pop r28 ae: 08 95 ret
Updated