Wiki

Clone wiki

inf225 / 2017 / Object Files and Disassembly

Experiments with the C compiler, object files, linking and disassembling

This is a small exercise / tutorial that will show you: * the basics of the command line tools for compiling and linking programs. * how to inspect the code generated by the C and Java compilers, and see the disassembled machine instructions, and how information about programs are stored * how the compiler will translate simple things * the effect of code optimisation

Contents

More resources

You need

(If you want to try things for yourself, rather than just reading the text)

  • A computer running Linux or Mac OS, or possibly Windows. (On Windows 10, you can also just install all the standard Linux tools from the Windows App store, e.g. Ubuntu on Windows)
  • A C compiler and binutils (packages gcc and binutils in Ubuntu; Xcode from the app store on Mac – if you get invalid active developer path on Mac, you need to install Xcode and run xcode-select --install from the command line (say yes to installing command line tools))
  • An editor, and an open command line / terminal window.

Compiling and Inspecting Code

hello.c

You'll need to create this file on your compiler, using your favourite editor.

:::c
#include <stdio.h>

int main(int argc, char** argv) {
    puts("Hello, world!\n");
    int x = 42;
    int y = x * 17;

    double a = 42.0;
    double b = a * 17.0;

    unsigned char m = 42;
    unsigned short n = m * 17;

    return argc;
}

To compile (on Linux)

Compile to object file

This produces an object file – the compiled code for whatever was in hello.c, but without the other code need to make it into a complete, runnable program:

gcc -g -c hello.c -o hello.o
or
cc -g -c hello.c -o hello.o
* The -g option means "include debug information", and -c means "just compile this to an object file, don't link it to make a runnable program". * These options are the same on all Linux/Unix/Mac C and C++ compilers (typically also for other languages). * Typically any Unix will let you use cc as an alias for the C compiler (and c++ or CC for the C++ compiler); on Linux, the standard C compiler will be GCC, on Mac OS, it's Clang nowadays (and if you're on a fancy supercomputer, the vendor may have a proprietary compiler). Typically the basic options and behaviour (like the things we're doing here) will be the same, but advanced options will be widely different. * In Java, there's no -c option to the javac compiler. Why? However, javac does have a -g option for including all debug information (apparently it normally doesn't include information about local variables).

If you omit -c, you'll get a runnable program as output. You can combine any number of .c files and .o files to produce a program, these will be linked together. If you don't specify the output file, the default name will typically be a.out (for historic reasons):

:::sh
# compile directly
cc hello.c -o hello
# link object file
cc hello.o -o hello
# run it
./hello
# default name
cc hello.c
./a.out

Normally, programs will be dynamically linked against the standard libraries, to avoid duplication and to make it possible to deploy bugfixes and updates for libraries without recompiling everything. It's also possible to specify static linking with the -static compiler option; this will make the executable file much larger; but it also allows you to see all the code that contributes to running your program. Statically-linked executables may also be more long-term portable, since the standard library versions may change over time (particularly on Linux, where it's assumed that you can just recompile if necessary).

To cross-compile (on Linux)

You can also compile things for other platforms and machine architectures, if you have a cross-compiler. There are a bunch of these available in standard Linux distributions, capable of generating code for many different platforms, including ARM, AVR, Win32/Win64, IBM mainframes, etc. (You can see all variants of GCC in Debian here – this also includes compilers for a lot of different languages. The C cross-compilers will have names starting with gcc-. If you check the page for the gcc package you'll see at the bottom that it's available for a list of architectures (you'll normally be running on amd64 (even if you have an Intel processor)). If you just run gcc, you'll get object files compiled for the platform you're currently running.

Typically, you'll use a cross-compiler when you're developing for a machine that's too small or too inconvenient to use for your development environment – for example, phones or tablets (typically ARM architecture), or embedded systems (e.g., Atmel AVR, Freescale, PIC and many others). For example, the Arduino IDE includes cross compilers for the boards you're developing for – not only wouldn't the compiler fit in the ROM and RAM of a small Arduino, the Harvard-style architecture of the chip won't let you execute data (compiler output) as code.

To look at the output of a cross-compiler, you first need to have one (or several) installed. Package names below refers to what you'd need to install on an Ubuntu/Debian or similar Linux system.

Compiling for ARM

(gcc-arm-none-eabi and binutils-arm-none-eabi in Ubuntu/Debian):

arm-none-eabi-gcc -g -c hello.c -o hello-arm.o

Compiling for AVR

(8-bit microcontrollers used in Arduino; you need the packages gcc-avr and binutils-avr in Ubuntu/Debian):

avr-gcc -g -c hello.c -o hello-avr.o

To inspect compiled code (on Linux)

To see disassembled code, run objdump -d hello.o. (or arm-none-eabi-objdump, avr-objdump, <architecture-name>-objdump to inspect code compiled for another architecture.

Particular other options you may be interested in for objdump include:

  • -t (or --syms) Display the contents of the symbol table(s) – this will give you a list of all the names that are referred to in the object file. This will tell you, for example, that the function main is in the .text segment of the file (this is where program code will be), it's at address 0, and it's 105 bytes long (0x69 in hex), and also whether it's local to the file (l) or global (g), so that it can be linked to from other files, as well as a number of other properties. See objdump(1) for more info. There's a similar option -T for dumping dynamic symbols of a shared library.

  • -r Display the relocation entries in the file – this shows you a list of addresses in the object file that must be relocated before the code can be used. Relocation is used because you normally won't know exactly what address a piece of code will end up at in memory – that depends on what other pieces of code will be loaded (libraries we're using, for example). Typically, this means that relative addresses (e.g., the function foo is at address 0x20 in this file, so a call to foo will be something like call 0x20) will be replaced by absolute addresses (e.g., this file gets loaded at 0x8200, so foo will be at 0x8220 and I need to update all calls to foo to reflect that). Nowadays, code will be [moved around on purpose] even if you might have been able to determine the final memory location at compile time, in order to make security breakins harder.

  • -g (or --debugging) Display debug information in object file – shows you any debugging information embedded in the file. The compiler adds debug information if you give it the -g option. This is basically everything you'd need to run the code in a debugger, and let it go step-by-step through the code and display the contents of variables and so on. (You'll probably need specialised knowledge to interpret this.)

Particular things to try

See what dynamic libraries a program uses

On Linux:

$ ldd hello
    linux-vdso.so.1 =>  (0x00007ffd1f387000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fe0ac91a000)
    /lib64/ld-linux-x86-64.so.2 (0x000055665097f000)
$ ldd /usr/bin/xterm 
    linux-vdso.so.1 =>  (0x00007fffc49fb000)
    libXinerama.so.1 => /usr/lib/x86_64-linux-gnu/libXinerama.so.1 (0x00007fe002cf9000)
    libXft.so.2 => /usr/lib/x86_64-linux-gnu/libXft.so.2 (0x00007fe002ae4000)
    libfontconfig.so.1 => /usr/lib/x86_64-linux-gnu/libfontconfig.so.1 (0x00007fe0028a1000)
    libXaw.so.7 => /usr/lib/x86_64-linux-gnu/libXaw.so.7 (0x00007fe00262d000)
    libXmu.so.6 => /usr/lib/x86_64-linux-gnu/libXmu.so.6 (0x00007fe002414000)
    libXt.so.6 => /usr/lib/x86_64-linux-gnu/libXt.so.6 (0x00007fe0021a9000)
    libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007fe001e70000)
        ...
(As seen in the manual, it's possible to trick ldd into running the program it's inspecting, so don't use it on things you don't trust. You can get some of the same information with objdump -p file | grep NEEDED.

Inspect symbol table

Try objdump -t:

$ objdump -t hello.o

hello.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 hello.c
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 l    d  .data  0000000000000000 .data
0000000000000000 l    d  .bss   0000000000000000 .bss
0000000000000000 l    d  .rodata    0000000000000000 .rodata
0000000000000000 l    d  .debug_info    0000000000000000 .debug_info
0000000000000000 l    d  .debug_abbrev  0000000000000000 .debug_abbrev
0000000000000000 l    d  .debug_aranges 0000000000000000 .debug_aranges
0000000000000000 l    d  .debug_line    0000000000000000 .debug_line
0000000000000000 l    d  .debug_str 0000000000000000 .debug_str
0000000000000000 l    d  .note.GNU-stack    0000000000000000 .note.GNU-stack
0000000000000000 l    d  .eh_frame  0000000000000000 .eh_frame
0000000000000000 l    d  .comment   0000000000000000 .comment
0000000000000000 g     F .text  0000000000000069 main
0000000000000000         *UND*  0000000000000000 _GLOBAL_OFFSET_TABLE_
0000000000000000         *UND*  0000000000000000 puts
The table format is as follows (from objdump(1)):

  • Here the first number is the symbol's value (sometimes refered to as its address). The next field is actually a set of characters and spaces indicating the flag bits that are set on the symbol. These characters are described below. Next is the section with which the symbol is associated or ABS if the section is absolute (ie not connected with any section), or UND if the section is referenced in the file being dumped, but not defined there.

  • After the section name comes another field, a number, which for common symbols is the alignment and for other symbol is the size. Finally the symbol's name is displayed.

Notice the following things:

  • There are four main sections (or segments) in an object file: text, which contains program code, data and rodata which contains associated program data (such as initialised globals, constants, strings, etc), and bss which contains uninitialised data (such as space for global variables that are not constants, or that are initialised by a constructor call). Above, there's also sections for various debug information.
  • The text section contains one interesting symbol:
    0000000000000000 g     F .text  0000000000000069 main
    
  • This is the main function of the C program. The first number is its address within the section (this will be relocated to an actual memory address when it's linked or loaded into memory), the g indicates that it's a global symbol (can be accessed from other objects), the F indicates that it's a function (so it's sensible to call it), .text is the section name, and 69 is the length (105 bytes).
  • There are also to interesting symbols marked *UND* (undefined):
    0000000000000000         *UND*  0000000000000000 _GLOBAL_OFFSET_TABLE_
    0000000000000000         *UND*  0000000000000000 puts
    
  • This is why the hello.o can't be run as a program – some pieces of code aren't there yet. puts is a function from that standard C library, which is needed for printing "Hello, world!". It'll end up being provided by the library libc.so.6 which we saw listed in the output of ldd. If you look at the output of objdump -t hello, you'll see that it's different in the full program:
    0000000000000000       F *UND*  0000000000000000              puts@@GLIBC_2.2.5
    
  • puts is still undefined, but it now knows that it's found in version GLIBC_2.2.5 of libc.6.so (there'll be information about shared libraries in the .dynamic section of the file).
  • The _GLOBAL_OFFSET_TABLE_ is used by Linux to deal with shared libraries, and sharing the same library code across multiple processes. You can read about it here. It'll be provided by the compiler when you produce the executable program:
    0000000000200fb0 l     O .got   0000000000000000              _GLOBAL_OFFSET_TABLE_
    

Disassembling code

This will show you the assembly language instructions the compiler has chosen when it compiled your C code.

Use objdump -d <filename> to disassemble all code in a file – or, even better, objdump -d -S <filename> to see the C source together with the assembler code, or objdump -d -l <filename> to see the line numbers from the original C file (if you compiled with the -g option).

Note that objdump by default outputs x86/x86_64 assembly code in AT&T syntax, rather than Intel syntax, which may confuse you if you're used to the Intel syntax. For example, AT&T syntax has the target operand on the right-hand side and Intel has it on the left-hand side, so mov %rsp,%rbp (AT&T syntax) corresponds to mov rbp,rsp (Intel syntax). Select Intel syntax by adding the option -M intel (Linux) or -x86-asm-syntax=intel (MacOS).

Here's the full disassembly of hello.o, so you can study it for yourself. See below for a guided tour.

:::objdump
$ objdump -d -l hello.o

hello.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <main>:
main():
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:3
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 30             sub    $0x30,%rsp
   8:   89 7d dc                mov    %edi,-0x24(%rbp)
   b:   48 89 75 d0             mov    %rsi,-0x30(%rbp)
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:4
   f:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 16 <main+0x16>
  16:   e8 00 00 00 00          callq  1b <main+0x1b>
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:5
  1b:   c7 45 e4 2a 00 00 00    movl   $0x2a,-0x1c(%rbp)
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:6
  22:   8b 55 e4                mov    -0x1c(%rbp),%edx
  25:   89 d0                   mov    %edx,%eax
  27:   c1 e0 04                shl    $0x4,%eax
  2a:   01 d0                   add    %edx,%eax
  2c:   89 45 e8                mov    %eax,-0x18(%rbp)
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:8
  2f:   f2 0f 10 05 00 00 00    movsd  0x0(%rip),%xmm0        # 37 <main+0x37>
  36:   00 
  37:   f2 0f 11 45 f0          movsd  %xmm0,-0x10(%rbp)
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:9
  3c:   f2 0f 10 4d f0          movsd  -0x10(%rbp),%xmm1
  41:   f2 0f 10 05 00 00 00    movsd  0x0(%rip),%xmm0        # 49 <main+0x49>
  48:   00 
  49:   f2 0f 59 c1             mulsd  %xmm1,%xmm0
  4d:   f2 0f 11 45 f8          movsd  %xmm0,-0x8(%rbp)
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:11
  52:   c6 45 e3 2a             movb   $0x2a,-0x1d(%rbp)
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:12
  56:   0f b6 55 e3             movzbl -0x1d(%rbp),%edx
  5a:   89 d0                   mov    %edx,%eax
  5c:   c1 e0 04                shl    $0x4,%eax
  5f:   01 d0                   add    %edx,%eax
  61:   89 45 ec                mov    %eax,-0x14(%rbp)
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:14
  64:   8b 45 dc                mov    -0x24(%rbp),%eax
/home/anya/git/inf225/inf225.h17/src/object-files/hello.c:15
  67:   c9                      leaveq 
  68:   c3                      retq   

Guided tour of hello.o

Function preamble

This code sets up the stack frame and stores arguments in local variables and so on. The exact code will depend on the calling conventions for the platform, which is specified in the Application Binary Interface (ABI). This code uses the System V AMD64 ABI, which is used by Linux, macOS, FreeBSD and other Unix-like systems for 64-bit PCs. Windows uses a different scheme, and the conventions will be different on different architectures (such as ARM, or 32-bit x86).

:::c
int main(int argc, char** argv) {
First, save RBP on the stack, so we can restore it later:
:::objdump
   0:   55                      push   %rbp             
Save stack pointer (RSP) in RBP, this will be our frame pointer – we'll find local variables by indexing off RBP:
:::objdump
   1:   48 89 e5                mov    %rsp,%rbp
Set aside space on the stack for the stack frame (stack will now look like: ...stuff belonging for whomever called us, RBP: local variables (0x30 bytes), RSP: space for pushing temporaries onto the stack
:::objdump
   4:   48 83 ec 30             sub    $0x30,%rsp
Our first argument comes in the EDI register – this is the 32-bit version of RDI, and appropriate for storing a normal int. We store it at offset -0x24 in the stack frame – this is the variable int argc.
:::objdump
   8:   89 7d dc                mov    %edi,-0x24(%rbp)
Our first argument comes in the RSI register – this is the 64-bit version of ESI, and appropriate for storing a pointer (to an array of strings, in this case). We store it at offset -0x30 in the stack frame – this is the variable char** argc.
:::objdump
   b:   48 89 75 d0             mov    %rsi,-0x30(%rbp)
(In general, integer and pointer arguments come in the registers RDI, RSI, RDX, RCX, R8, R9 (or the 8/16/32-bit versions of them), and floating point arguments in XMM0–7. Any additional arguments that don't fit in registers can be found on the stack.)

Stack frame layout

With a bit of effort, you can figure out where the variables are in the stack frames (or even in registers) from the debugging info included in the object file (though it's likely easier to just guess):

  • Run objdump -g hello.o (not sure how to do this on Mac)
  • At the bottom, in the .eh_frame section, there's information on the stack frame offset (DW_CFA_def...); this can vary for different parts of the program, for example before we do mov %rsp,%rbp (this is set up with DW_CFA_advance_loc). In our case, the CFA (Canonical Frame Address) for most of the main function is RBP-16, which means that the variable offsets in the debug info should be adjusted by 16 to find the offsets used when accessing the stack frame through RBP.
  • Close to the top, you'll find:
     <2><324>: Abbrev Number: 19 (DW_TAG_formal_parameter)
        <325>   DW_AT_name        : (indirect string, offset: 0xf6): argc
        <329>   DW_AT_decl_file   : 1
        <32a>   DW_AT_decl_line   : 3
        <32b>   DW_AT_type        : <0x62>
        <32f>   DW_AT_location    : 2 byte block: 91 4c     (DW_OP_fbreg: -52)
    
  • This says that the variable argc is defined at line 3, has type 0x62 (which is int, there's an entry for this as well), and it's location is at offset -52 from the stack frame register.
  • So, we take -52, adjust it by 16, so we get -36, which is -0x24 in hex – so the variable argc can be found at -0x24(%rbp).

Addresses, relocation and function calls

:::c
puts("Hello, world!\n");
:::objdump
   f:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 16 <main+0x16>
  16:   e8 00 00 00 00          callq  1b <main+0x1b>
These two instructions print a string, by first loading the address of the string into RDI (the register used for the first argument, just like we received argc in the lower-half of RDI), and then calling the puts function.

We can't see this, however, without more information. The addresses used in both instructions are 0 (00 00 00 00 in the binary code) – these are interpreted relative to the program counter (RIP), so the string address is 0x16 (the first byte after the lea instruction), and the function address is 0x1b (first byte after the call instruction) – which is nonsense.

Both instructions refer to things that are defined in different sections (and, in the case of puts in a completely different file that we might not see until the program is executed). The linker will fill in the correct values at a later point, by looking at the relocation table finding the correct location of the relevant symbol, and then writing that back into the instruction.

We can see this information by adding the -r option to objdump, this gives us:

:::objdump
   f:   48 8d 3d 00 00 00 00    lea    0x0(%rip),%rdi        # 16 <main+0x16>
            12: R_X86_64_PC32   .rodata-0x4
  16:   e8 00 00 00 00          callq  1b <main+0x1b>
            17: R_X86_64_PLT32  puts-0x4
Here we can see that the address stored at location 12 will be replaced by the address of .rodata-4, which will end up being the first byte of our read-only data area. (The compiler has subtracted 4, because the address is looked up relative to program counter (R_X86_64_PC32 – it's a PC-relative 32-bit address); the relocation is done at address 12, but the CPU will be at address 16 when it does the computation, hence -4).

If we look at .rodata (objdump -s -j .rodata hello.o), it looks like this:

Contents of section .rodata:
 0000 48656c6c 6f2c2077 6f726c64 21000000  Hello, world!...
 0010 00000000 00004540 00000000 00003140  ......E@......1@

The callq instruction will also do PC-relative addressing (basically, it does something like push %rip; add $FUN_ADDRESS, %rip), so the same principle applies (the linker will fill in puts-4).

Figuring out relocated addresses

We can check what the relocation table looks like using objdump -r (-j .text gives us only the part we're interested in):

$ objdump  -r -j .text hello.o

hello.o:     file format elf64-x86-64

RELOCATION RECORDS FOR [.text]:
OFFSET           TYPE              VALUE 
0000000000000012 R_X86_64_PC32     .rodata-0x0000000000000004
0000000000000017 R_X86_64_PLT32    puts-0x0000000000000004
0000000000000033 R_X86_64_PC32     .rodata+0x000000000000000c
0000000000000045 R_X86_64_PC32     .rodata+0x0000000000000014
The offsets are the locations that contain unresolved addresses, and the values are the expressions that should be inserted there (once the values of the named symbols are known). The calculation will go something like this (for "Hello, world!\n"):

  • address = (.rodata-0x04)-(.text+0x12)

The CPU will then calculate the address as start of next instruction (=.text+0x16) + address. (I.e., .text+0x16 + (.rodata-0x04) - (.text+0x12) = 0x04 + .rodata - 0x04 = .rodata, which is where the string is stored)

You can see this in action if you look at the hello program, which has been relocated and linked:

 6af:   48 8d 3d e2 00 00 00    lea    0xe2(%rip),%rdi
The text section starts at 0x570 and the main function starts at 0x6a0 (before main, there are a bunch of other functions that are added by the C runtime, including _start, which is the actual entry point of the program). The .rodata of hello looks like this:
$ objdump  -s -j .rodata hello

hello:     file format elf64-x86-64

Contents of section .rodata:
 0790 01000200 00000000 48656c6c 6f2c2077  ........Hello, w
 07a0 6f726c64 21000000 00000000 00004540  orld!.........E@
 07b0 00000000 00003140                    ......1@    
There are a few other read-only constants stored here; the string begins at 0x798. So, when the program is running and it's time to execute the lea instruction the CPU will

  • Start at 0x6af, read the seven bytes of the instruction (including the 4 bytes of the address offset) – leaving the program counter (%rip) at 0x6b6.
  • Compute the address as 0xe2 + %rip = 0xe2 + 0x6b6 = 0x798.
  • Store the address in %rdi, where puts will expect to find its string argument.
  • Continue executing at 0x6b6, which is the call to puts.
Library calls

The call to puts could be handled in the same way, and it would have been if it came as part of our program or a statically linked library. In this case, however, puts is found in the GNU C Library, which is a dynamically linked shared library, so we won't know the addresses before the program is loaded. To make it possible to share the same program code between several running processes (Linux will do this automatically) and avoid swapping executable code out to disk if memory is tight (much better to just discard the code from memory, and read it again from the program file if necessary), the code isn't actually subjected to relocation when it's loaded – it's just loaded unchanged from disk, without substituting addresses. For this to work, addresses must be decide on in advance, and anything we can't decide in advance must be looked up in a table at run time. This is the global offset table (GOT) mentioned earlier, and it works with the help of a Procedure Linkage Table (PLT).

Let's look at the call instruction again, after it's been linked. Instead of

:::objdump
  16:   e8 00 00 00 00          callq  1b <main+0x1b>
the hello executable file contains:
:::objdump
 6b6:   e8 a5 fe ff ff          callq  560 <.plt.got>
which is a call to the function at address 0x560. (a5 fe ff ff is 0xfffffea5, which is -0x15b, and 0x6b6+5-0x15b=0x560)

If we asked for the full disassembly with objdump -d hello, we'll also find the address 0x560, and it's in the .plt.got section:

:::objdump
0000000000000560 <.plt.got>:
 560:   ff 25 6a 0a 20 00       jmpq   *0x200a6a(%rip)        # 200fd0 <puts@GLIBC_2.2.5>
 566:   66 90                   xchg   %ax,%ax
 568:   ff 25 8a 0a 20 00       jmpq   *0x200a8a(%rip)        # 200ff8 <__cxa_finalize@GLIBC_2.2.5>
 56e:   66 90                   xchg   %ax,%ax
So, 0x560 is an indirect jump to the location stored at 0x200fd0. (The * is pointer dereferencing: load the target address from 0x200a6a(%rip), rather than jumping to 0x200a6a(%rip); in Intel-style assembler, this is written as QWORD PTR [rip+0x200a6a].) You can ignore the xchg instruction; it's just a filler do make sure the table entries are neatly aligned to 8 bytes.

0x200fd0 is an address in the .got section:

$ objdump  -s -j .got hello

hello:     file format elf64-x86-64

Contents of section .got:
 200fb0 f00d2000 00000000 00000000 00000000  .. .............
 200fc0 00000000 00000000 00000000 00000000  ................
 200fd0 00000000 00000000 00000000 00000000  ................
 200fe0 00000000 00000000 00000000 00000000  ................
 200ff0 00000000 00000000 00000000 00000000  ................
It's not particularly interesting to look at, since it won't be filled in until the program starts running. How do we know what addresses should be filled into the GOT at load time? There's a relocation table for that:
$ objdump  -R hello

hello:     file format elf64-x86-64

DYNAMIC RELOCATION RECORDS
OFFSET           TYPE              VALUE 
0000000000200dd8 R_X86_64_RELATIVE  *ABS*+0x0000000000000670
0000000000200de0 R_X86_64_RELATIVE  *ABS*+0x0000000000000630
0000000000201008 R_X86_64_RELATIVE  *ABS*+0x0000000000201008
0000000000200fc8 R_X86_64_GLOB_DAT  _ITM_deregisterTMCloneTable
0000000000200fd0 R_X86_64_GLOB_DAT  puts@GLIBC_2.2.5
0000000000200fd8 R_X86_64_GLOB_DAT  __libc_start_main@GLIBC_2.2.5
0000000000200fe0 R_X86_64_GLOB_DAT  __gmon_start__
0000000000200fe8 R_X86_64_GLOB_DAT  _Jv_RegisterClasses
0000000000200ff0 R_X86_64_GLOB_DAT  _ITM_registerTMCloneTable
0000000000200ff8 R_X86_64_GLOB_DAT  __cxa_finalize@GLIBC_2.2.5
Here we'll see that location 0x200fd0 should be filled with the runtime address of the puts function.

Arithmetic operations

The rest of the main function is much less complicated.

Integers

Variables are strictly speaking not declared in the compiled code; we just have an area set aside for them on the stack. But C variable declarations with an initialiser will end up as code assigning the initial value to the variable:

:::c
int x = 42;
:::objdump
  1b:   c7 45 e4 2a 00 00 00    movl   $0x2a,-0x1c(%rbp)        # x = 42
Here, we see that x is located at offset -0x1c in the stack frame (consulting the debug info should let us verify this).

The code for integer multiplication is somewhat interesting:

:::c
int y = x * 17;
:::objdump
  22:   8b 55 e4                mov    -0x1c(%rbp),%edx        # %edx = x
  25:   89 d0                   mov    %edx,%eax               # %eax = %edx
  27:   c1 e0 04                shl    $0x4,%eax               # %eax = %eax << 4
  2a:   01 d0                   add    %edx,%eax               # %eax = %eax + %edx
  2c:   89 45 e8                mov    %eax,-0x18(%rbp)        # y = %eax
What's going on here? Why does it do shift and add instead of multiplication?

Doubles

:::c
double a = 42.0;
:::objdump
  2f:   f2 0f 10 05 00 00 00    movsd  0x0(%rip),%xmm0        # 37 <main+0x37>
  36:   00 
  37:   f2 0f 11 45 f0          movsd  %xmm0,-0x10(%rbp)
Here we have a case of relocation again – the zeros will be filled in with the floating point representation of 42; if you look at the relocation table, and the inside .rodata, you'll find the double value at offset 0x10:
 0000 48656c6c 6f2c2077 6f726c64 21000000  Hello, world!...
 0010 00000000 00004540 00000000 00003140  ......E@......1@
and 00000000 00004540 is the little-endian hex representation of the IEEE 754 64-bit double-precision floating point number 42 (there's a handy converter that lets you find the hex representation of floats.

(Probably, the value is moved via the %xmm0 register because there is no memory-to-memory move instruction for floats. I think.)

The multiplication itself is very easy, and uses a multiply instruction rather than shifts and adds:

:::objdump
  3c:   f2 0f 10 4d f0          movsd  -0x10(%rbp),%xmm1      # read a
  41:   f2 0f 10 05 00 00 00    movsd  0x0(%rip),%xmm0        # read 17.0
  48:   00 
  49:   f2 0f 59 c1             mulsd  %xmm1,%xmm0            # a*17.0
  4d:   f2 0f 11 45 f8          movsd  %xmm0,-0x8(%rbp)       # b = a*17.0

8/16-bit values

Finally, we have the same computation on 8-bit unsigned chars, with the result stored in a 16-bit unsigned short:

:::objdump
unsigned char m = 42;
  52:   c6 45 e5 2a             movb   $0x2a,-0x1b(%rbp)
unsigned short n = m * 17;
  56:   0f b6 55 e5             movzbl -0x1b(%rbp),%edx
  5a:   89 d0                   mov    %edx,%eax
  5c:   c1 e0 04                shl    $0x4,%eax
  5f:   01 d0                   add    %edx,%eax
  61:   66 89 45 e6             mov    %ax,-0x1a(%rbp)
Loading and storing of variables happens with 8/16-bit instructions (movb for a byte, mov %ax for a 16-bit word). But notice that the actual computation happens in the %eax register, which is 32 bits – most likely, a modern processor will be faster doing a 32 or 64 bit arithmetic operation, rather than working with smaller pieces of data (most likely, there's no circuitry for reading, writing and calculating with 8 or 16 bit values in the processor; instead, things are done in 32 or 64 bits internally, and then the extra bits are discarded).

Returning from a function

Finally, we have the code for cleaning up and returning to the caller. The ABI states that int return values should be stored in the EAX register:

:::objdump
return argc;
  65:   8b 45 dc                mov    -0x24(%rbp),%eax
The next step is to clean up the stack, setting the stack pointer and frame pointer back the values used by the caller. The leave (the q means it works on 64-bit pointers) instruction will do this:
:::objdump
  68:   c9                      leaveq 
It has the reverse effect of the first two instructions, and the job could just as easily be done by moving and popping:
:::gas
mov %rbp,%rsp   # our stack frame pointer will be the caller's stack pointer
pop %rbp        # pop caller's frame pointer from the stack
(There's actually also an enter instruction, that replicates the functionality of the first push, mov and sub from the start of the function – this is apparently not used by GCC.)

If we had used other non-scratch registers, we would now also have to restore them to their previous values from the stack.

Finally, we return to the caller:

:::objdump
  69:   c3                      retq   
The return instruction will pop the return address off the stack and into the %rip program counter, so that execution continues from the instruction after the call that called main.

Things to try on your own

  • Change the C code in hello.c, recompile and see what the difference is. For example:
    • Try different types (long, long long, short, float, ...) and see how the instructions and stack frame addresses change
    • Try different arithmetic operations; also try multiplying by a non-constant factor.
  • Try turning on the optimizer (cc -O -c -g hello.c -o hello.o – GCC will accept various optimisation levels, -O1, -O2, -O3, -Os (optimise for size) etc).
    • Quite likely, all your code will disappear – that's because the optimiser sees that you're not actually using the values you're computing. You can try to work around this by printing the values before your return (printf("%d %f %d\n", y, b, n);) – that also doesn't help, because the compiler will just precompute the constant expressions.
    • To see optimized, code, you'll have to trick the compiler so it can't know the values in advance, e.g. with a different file calc.c:
      :::c
      #include <stdio.h>
      
      int cal(int x, int f, double a, unsigned char m) {
          puts("Hello, world!\n");
          int y = x * f;
      
          double b = a * f;
      
          unsigned short n = m * (unsigned short)f;
      
          printf("%d %f %d\n", y, b, n);
      
          return y;
      }
      

You can try a few variants, including using 17 instead of f (the compiler might switch back to using shift+add, except when optimising for size (-Os), where it might try to use fewer instructions even if they're slower).

Java Object Files

The Java .class files are also object files, containing bytecode for the Java Virtual Machine (JVM). The file format is quite different from the ELF files used for native code on Unix-like systems.

You can examine Java class files with the javap command. Here's an example that's very similar to our running example:

:::java
public class Hello {
    public static void main(String[] args) {
        System.out.println("Hello, world!%n");
        int x = 42;
        int y = x * 17;

        double a = 42.0;
        double b = a * 17.0;

        byte m = 42;
        short n = (short)(m * 17);
    }
}

Compile it with javac -g Hello.java. The -g option adds debug information about local variables. (No -c necessary with the Java compiler, it always produces stand-alone classes, and any "linking" happens when the class is loaded into the JVM. The comparable thing to linking a bunch of .o files into an executable program would be to bundle up a bunch of classes in a .jar files with the jar command.)

Look at the JVM code using javap:

$ javap -constants -l -s -c Hello.class 
:::jvm
Compiled from "Hello.java"
public class Hello {
  public Hello();
    descriptor: ()V
    Code:
       0: aload_0
       1: invokespecial #1                  // Method java/lang/Object."<init>":()V
       4: return
    LineNumberTable:
      line 2: 0
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0       5     0  this   LHello;

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    Code:
       0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
       3: ldc           #3                  // String Hello, world!%n
       5: invokevirtual #4                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
       8: bipush        42
      10: istore_1
      11: iload_1
      12: bipush        17
      14: imul
      15: istore_2
      16: ldc2_w        #5                  // double 42.0d
      19: dstore_3
      20: dload_3
      21: ldc2_w        #7                  // double 17.0d
      24: dmul
      25: dstore        5
      27: bipush        42
      29: istore        7
      31: iload         7
      33: bipush        17
      35: imul
      36: i2s
      37: istore        8
      39: return
    LineNumberTable:
      line 4: 0
      line 5: 8
      line 6: 11
      line 8: 16
      line 9: 20
      line 11: 27
      line 12: 31
      line 13: 39
    LocalVariableTable:
      Start  Length  Slot  Name   Signature
          0      40     0  args   [Ljava/lang/String;
         11      29     1     x   I
         16      24     2     y   I
         20      20     3     a   D
         27      13     5     b   D
         31       9     7     m   B
         39       1     8     n   S
}
* JVM is a stack machine, so all arguments are pushed onto the stack, including arguments to bytecode instructions. * Types and methods have descriptors that make them unique (to avoid problems with overloaded names). For example, the method descriptor of main is ([Ljava/lang/String;)V: a method that takes an array of strings and returns void. A similar scheme called name mangling is used by languages such as C++ to encode overload information into strings in a way compatible with C. * Local variables are stored in slots on the stack; so dstore_3 will pop a double from the stack and store the it in a (slot 3), and dload_3 will push the contents of a onto the stack. * You'll notice some unnecessary code – for instance istore_1 directly followed by iload_1, which is pretty much the same as we had in the x86 code above (e.g., movl $0x2a,-0x1c(%rbp), mov -0x1c(%rbp),%edx), without anyone reading from x / slot 1 / -0x1c(%rbp) afterwards). * The C compiler will remove unnecessary local variables if you tell it to optimise; the standard Java compiler never optimises the code. * A JVM with JIT support (such as the standard HotSpot virtual machine) will however do such optimisations on the fly. * Optimisation typically interferes with debugging – both because the code that gets run can be radically different from the source code the programmer is looking at, and because important local variables may have been optimised away.

macOS output

Disassembly

Notice that:

  • The ABI is the same as on Linux; stack frame is in %rbp, the arguments arrive in %edi and %rsi, the return value is placed in %eax.
  • The code is generated by Clang, which is also available on Linux, and it chooses different instructions from GCC:
    • addq and popq to restore the stack before returning instead of leaveq – the effect is the same.
    • imull to do 32-bit signed integer multiplication, instead of shift/add
    • The instruction order is also a bit different, with the floating point move instructions placed before the integer multiplication – possibly so that the relatively slow memory access will proceed while the processor is busy computing stuff.
  • Clang on Linux will produce similar output.
    :::objdump
    $ objdump -line-numbers -d hello.o
    hello.o:        file format Mach-O 64-bit x86-64
    
    Disassembly of section __TEXT,__text:
    _main:
           0:       55                      pushq   %rbp
           1:       48 89 e5                movq    %rsp, %rbp
           4:       48 83 ec 40             subq    $64, %rsp
           8:       48 8d 05 71 00 00 00    leaq    113(%rip), %rax
           f:       c7 45 fc 00 00 00 00    movl    $0, -4(%rbp)
          16:       89 7d f8                movl    %edi, -8(%rbp)
          19:       48 89 75 f0             movq    %rsi, -16(%rbp)
          1d:       48 89 c7                movq    %rax, %rdi
          20:       b0 00                   movb    $0, %al
          22:       e8 00 00 00 00          callq   0 <_main+0x27>
          27:       f2 0f 10 05 41 00 00 00 movsd   65(%rip), %xmm0
          2f:       f2 0f 10 0d 41 00 00 00 movsd   65(%rip), %xmm1
          37:       c7 45 ec 2a 00 00 00    movl    $42, -20(%rbp)
          3e:       6b 4d ec 11             imull   $17, -20(%rbp), %ecx
          42:       89 4d e8                movl    %ecx, -24(%rbp)
          45:       f2 0f 11 4d e0          movsd   %xmm1, -32(%rbp)
          4a:       f2 0f 59 45 e0          mulsd   -32(%rbp), %xmm0
          4f:       f2 0f 11 45 d8          movsd   %xmm0, -40(%rbp)
          54:       c6 45 d7 2a             movb    $42, -41(%rbp)
          58:       0f b6 4d d7             movzbl  -41(%rbp), %ecx
          5c:       6b c9 11                imull   $17, %ecx, %ecx
          5f:       89 4d d0                movl    %ecx, -48(%rbp)
          62:       8b 4d f8                movl    -8(%rbp), %ecx
          65:       89 45 cc                movl    %eax, -52(%rbp)
          68:       89 c8                   movl    %ecx, %eax
          6a:       48 83 c4 40             addq    $64, %rsp
          6e:       5d                      popq    %rbp
          6f:       c3                      retq
    
    hello.o:        file format Mach-O 64-bit x86-64
    
    SYMBOL TABLE:
    0000000000000000 g     F __TEXT,__text  _main
    0000000000000000         *UND*          _printf
    

Inspecting dynamic libraries on macOS

$ objdump -macho -dylibs-used ./hello
hello:
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)
$ objdump -macho -dylibs-used /bin/ls
/bin/ls:
    /usr/lib/libutil.dylib (compatibility version 1.0.0, current version 1.0.0)
    /usr/lib/libncurses.5.4.dylib (compatibility version 5.4.0, current version 5.4.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.0.0)

Output for ARM and AVR architectures

ARM Disassembly of hello.o:

Notice the following:

  • All the instructions are the same length (4 bytes / 32 bits), unlike x86/amd64 code which can vary widely in size.
  • The ARM code also uses shift and add to do integer multiplication by a constant (x << 4 multiplies by 2^4 = 16; add x again to multiply by 17). This may be faster than generic multiplication.
  • The ARM code below does a library call to do floating point multiplication, and doubles are stored in two normal 32-bit registers. If we had compiled for a more advanced ARM processor, it would probably have builtin floating point operations.
  • All the instructions that process data uses registers, and only the store and load instructions access memory – you have to move stuff from memory/stack into a register before computing with a value. This is typical of a RISC architecture
    :::objdump
    $ arm-none-eabi-objdump  -dSr  hello-arm.o 
    
    hello-arm.o:     file format elf32-littlearm
    
    
    Disassembly of section .text:
    
    00000000 <main>:
    #include <stdio.h>
    
    int main(int argc, char** argv) {
       0:   e92d4810    push    {r4, fp, lr}
       4:   e28db008    add fp, sp, #8
       8:   e24dd02c    sub sp, sp, #44 ; 0x2c
       c:   e50b0030    str r0, [fp, #-48]  ; 0xffffffd0
      10:   e50b1034    str r1, [fp, #-52]  ; 0xffffffcc
        puts("Hello, world!\n");
      14:   e59f0088    ldr r0, [pc, #136]  ; a4 <main+0xa4>
      18:   ebfffffe    bl  0 <puts>
                18: R_ARM_CALL  puts
        int x = 42;
      1c:   e3a0302a    mov r3, #42 ; 0x2a
      20:   e50b3010    str r3, [fp, #-16]
        int y = x * 17;
      24:   e51b2010    ldr r2, [fp, #-16]
      28:   e1a03002    mov r3, r2
      2c:   e1a03203    lsl r3, r3, #4
      30:   e0833002    add r3, r3, r2
      34:   e50b3014    str r3, [fp, #-20]  ; 0xffffffec
    
        double a = 42.0;
      38:   e3a03000    mov r3, #0
      3c:   e59f4064    ldr r4, [pc, #100]  ; a8 <main+0xa8>
      40:   e50b301c    str r3, [fp, #-28]  ; 0xffffffe4
      44:   e50b4018    str r4, [fp, #-24]  ; 0xffffffe8
        double b = a * 17.0;
      48:   e3a02000    mov r2, #0
      4c:   e59f3058    ldr r3, [pc, #88]   ; ac <main+0xac>
      50:   e24b101c    sub r1, fp, #28
      54:   e8910003    ldm r1, {r0, r1}
      58:   ebfffffe    bl  0 <__aeabi_dmul>
                58: R_ARM_CALL  __aeabi_dmul
      5c:   e1a03000    mov r3, r0
      60:   e1a04001    mov r4, r1
      64:   e50b3024    str r3, [fp, #-36]  ; 0xffffffdc
      68:   e50b4020    str r4, [fp, #-32]  ; 0xffffffe0
    
        unsigned char m = 42;
      6c:   e3a0302a    mov r3, #42 ; 0x2a
      70:   e54b3025    strb    r3, [fp, #-37]  ; 0xffffffdb
        unsigned short n = m * 17;
      74:   e55b3025    ldrb    r3, [fp, #-37]  ; 0xffffffdb
      78:   e1a03803    lsl r3, r3, #16
      7c:   e1a03823    lsr r3, r3, #16
      80:   e1a02003    mov r2, r3
      84:   e1a02202    lsl r2, r2, #4
      88:   e0823003    add r3, r2, r3
      8c:   e14b32b8    strh    r3, [fp, #-40]  ; 0xffffffd8
    
        return argc;
      90:   e51b3030    ldr r3, [fp, #-48]  ; 0xffffffd0
    }
      94:   e1a00003    mov r0, r3
      98:   e24bd008    sub sp, fp, #8
      9c:   e8bd4810    pop {r4, fp, lr}
      a0:   e12fff1e    bx  lr
                a0: R_ARM_V4BX  *ABS*
      a4:   00000000    .word   0x00000000
                a4: R_ARM_ABS32 .rodata
      a8:   40450000    .word   0x40450000
      ac:   40310000    .word   0x40310000
    

(We'll need to have the relevant ARM libraries installed as well to compile and inspect the executable program.)

AVR Disassembly of hello.o:

Note:

  • All instructions are two bytes long
  • As in the ARM code, there's a lot of loads and stores ldd, ldi, std, although some of the instructions take small integers as immediate arguments.
  • Multiple registers are needed to hold values; this is an 8-bit processors, so registers are only 8-bit wide. The int data type is 16 bits. For example, the integer 42 is loaded by putting 0x2a into r24 and 0x00 into r25, and the double 42.0 needs four registers.
  • Integer multiplication is quite complicated, since the processor can only deal with 8 bits at a time.
  • If you compile to an executable program (i.e., something that might be uploaded to the flash memory on an Arduino), you'll notice that everything is statically linked, so the code will also include the puts function and the floating-point multiplication function.
    :::objdump
    $ avr-objdump  -dSr  hello-avr.o
    
    hello-avr.o:     file format elf32-avr
    
    
    Disassembly of section .text:
    
    00000000 <main>:
    #include <stdio.h>
    
    int main(int argc, char** argv) {
       0:   cf 93           push    r28
       2:   df 93           push    r29
       4:   cd b7           in  r28, 0x3d   ; 61
       6:   de b7           in  r29, 0x3e   ; 62
       8:   63 97           sbiw    r28, 0x13   ; 19
       a:   0f b6           in  r0, 0x3f    ; 63
       c:   f8 94           cli
       e:   de bf           out 0x3e, r29   ; 62
      10:   0f be           out 0x3f, r0    ; 63
      12:   cd bf           out 0x3d, r28   ; 61
      14:   99 8b           std Y+17, r25   ; 0x11
      16:   88 8b           std Y+16, r24   ; 0x10
      18:   7b 8b           std Y+19, r23   ; 0x13
      1a:   6a 8b           std Y+18, r22   ; 0x12
        puts("Hello, world!\n");
      1c:   80 e0           ldi r24, 0x00   ; 0
                1c: R_AVR_LO8_LDI   .rodata
      1e:   90 e0           ldi r25, 0x00   ; 0
                1e: R_AVR_HI8_LDI   .rodata
      20:   00 d0           rcall   .+0         ; 0x22 <main+0x22>
                20: R_AVR_13_PCREL  puts
        int x = 42;
      22:   8a e2           ldi r24, 0x2A   ; 42
      24:   90 e0           ldi r25, 0x00   ; 0
      26:   9a 83           std Y+2, r25    ; 0x02
      28:   89 83           std Y+1, r24    ; 0x01
        int y = x * 17;
      2a:   29 81           ldd r18, Y+1    ; 0x01
      2c:   3a 81           ldd r19, Y+2    ; 0x02
      2e:   82 2f           mov r24, r18
      30:   93 2f           mov r25, r19
      32:   82 95           swap    r24
      34:   92 95           swap    r25
      36:   90 7f           andi    r25, 0xF0   ; 240
      38:   98 27           eor r25, r24
      3a:   80 7f           andi    r24, 0xF0   ; 240
      3c:   98 27           eor r25, r24
      3e:   82 0f           add r24, r18
      40:   93 1f           adc r25, r19
      42:   9c 83           std Y+4, r25    ; 0x04
      44:   8b 83           std Y+3, r24    ; 0x03
    
        double a = 42.0;
      46:   80 e0           ldi r24, 0x00   ; 0
      48:   90 e0           ldi r25, 0x00   ; 0
      4a:   a8 e2           ldi r26, 0x28   ; 40
      4c:   b2 e4           ldi r27, 0x42   ; 66
      4e:   8d 83           std Y+5, r24    ; 0x05
      50:   9e 83           std Y+6, r25    ; 0x06
      52:   af 83           std Y+7, r26    ; 0x07
      54:   b8 87           std Y+8, r27    ; 0x08
        double b = a * 17.0;
      56:   20 e0           ldi r18, 0x00   ; 0
      58:   30 e0           ldi r19, 0x00   ; 0
      5a:   48 e8           ldi r20, 0x88   ; 136
      5c:   51 e4           ldi r21, 0x41   ; 65
      5e:   6d 81           ldd r22, Y+5    ; 0x05
      60:   7e 81           ldd r23, Y+6    ; 0x06
      62:   8f 81           ldd r24, Y+7    ; 0x07
      64:   98 85           ldd r25, Y+8    ; 0x08
      66:   00 d0           rcall   .+0         ; 0x68 <main+0x68>
                66: R_AVR_13_PCREL  __mulsf3
      68:   b9 2f           mov r27, r25
      6a:   a8 2f           mov r26, r24
      6c:   97 2f           mov r25, r23
      6e:   86 2f           mov r24, r22
      70:   89 87           std Y+9, r24    ; 0x09
      72:   9a 87           std Y+10, r25   ; 0x0a
      74:   ab 87           std Y+11, r26   ; 0x0b
      76:   bc 87           std Y+12, r27   ; 0x0c
    
        unsigned char m = 42;
      78:   8a e2           ldi r24, 0x2A   ; 42
      7a:   8d 87           std Y+13, r24   ; 0x0d
        unsigned short n = m * 17;
      7c:   8d 85           ldd r24, Y+13   ; 0x0d
      7e:   28 2f           mov r18, r24
      80:   30 e0           ldi r19, 0x00   ; 0
      82:   82 2f           mov r24, r18
      84:   93 2f           mov r25, r19
      86:   82 95           swap    r24
      88:   92 95           swap    r25
      8a:   90 7f           andi    r25, 0xF0   ; 240
      8c:   98 27           eor r25, r24
      8e:   80 7f           andi    r24, 0xF0   ; 240
      90:   98 27           eor r25, r24
      92:   82 0f           add r24, r18
      94:   93 1f           adc r25, r19
      96:   9f 87           std Y+15, r25   ; 0x0f
      98:   8e 87           std Y+14, r24   ; 0x0e
    
        return argc;
      9a:   88 89           ldd r24, Y+16   ; 0x10
      9c:   99 89           ldd r25, Y+17   ; 0x11
    }
      9e:   63 96           adiw    r28, 0x13   ; 19
      a0:   0f b6           in  r0, 0x3f    ; 63
      a2:   f8 94           cli
      a4:   de bf           out 0x3e, r29   ; 62
      a6:   0f be           out 0x3f, r0    ; 63
      a8:   cd bf           out 0x3d, r28   ; 61
      aa:   df 91           pop r29
      ac:   cf 91           pop r28
      ae:   08 95           ret
    

Updated