When you write code in a compiled language, several transformations occur before anything executes. Each layer has different concerns and different “owners.”
┌─────────────────────────────────────────────────────────────────┐
│ Source Code (C, Rust, Go, etc.) │
│ ↓ │
│ Compiler Your code → machine instructions │
│ ↓ │
│ Assembly Human-readable machine instructions │
│ ↓ │
│ Assembler Assembly text → raw bytes │
│ ↓ │
│ Object File Machine code + metadata (relocatable) │
│ ↓ │
│ Linker Combines objects, resolves symbols │
│ ↓ │
│ Executable Machine code + OS loading metadata │
│ ↓ │
│ Loader (OS) Maps into memory, patches addresses │
│ ↓ │
│ CPU Executes raw instruction bytes │
└─────────────────────────────────────────────────────────────────┘
The critical insight: machine code is standardized by CPU vendors, but everything around it is convention. The x86-64 instruction encoding is identical whether you’re on Linux, macOS, or Windows. What differs is the container format that wraps the machine code and tells the OS how to load it.
Machine Code is Just Bytes
At the bottom, CPUs execute bytes. The Intel/AMD x86-64 manual defines exactly what each byte sequence means. This is non-negotiable hardware behavior.
echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -c -x c - -o add.o
objcopy -O binary -j .text add.o add.bin
xxd add.bin00000000: 8d04 37c3 ..7.
Four bytes. That’s a complete function. Decoding it:
8d 04 37 lea eax, [rdi+rsi] ; eax = a + b
c3 ret ; return to caller
The lea (load effective address) instruction computes an address but stores the result instead of dereferencing. The compiler uses it as a three-operand add: eax = rdi + rsi. The first argument arrives in rdi, the second in rsi (System V AMD64 calling convention), and the return value goes in eax.
These bytes mean the same thing on every x86-64 system regardless of OS. The CPU doesn’t know what an “executable format” is—it just reads bytes from the instruction pointer and does what they say.
The add.bin file contains nothing but instruction bytes. No headers, no metadata. You can’t execute it directly because the OS doesn’t know where to load it or how to start it—but the bytes themselves are valid machine code.
Optimization Changes the Bytes
Without optimization, the same function compiles to something much larger:
echo 'int add(int a, int b) { return a + b; }' | gcc -O0 -fcf-protection=none -c -x c - -o add.o
objcopy -O binary -j .text add.o add.bin
xxd add.bin00000000: 5548 89e5 897d fc89 75f8 8b55 fc8b 45f8 UH...}..u..U..E.
00000010: 01d0 5dc3 ..].
20 bytes for the same two-integer addition:
55 push rbp ; save frame pointer
48 89 e5 mov rbp, rsp ; set up stack frame
89 7d fc mov [rbp-4], edi ; spill arg 'a' to stack
89 75 f8 mov [rbp-8], esi ; spill arg 'b' to stack
8b 55 fc mov edx, [rbp-4] ; reload 'a' into edx
8b 45 f8 mov eax, [rbp-8] ; reload 'b' into eax
01 d0 add eax, edx ; eax = a + b
5d pop rbp ; restore frame pointer
c3 ret
The unoptimized version spills arguments to memory then immediately reloads them. This supports debugging (variables have predictable stack locations) but wastes cycles. The optimizer eliminates all of this, recognizing that the arguments arrive in registers and can stay there.
Security Features Add Bytes
Modern GCC inserts control-flow integrity markers by default:
echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -c -x c - -o add.o
objcopy -O binary -j .text add.o add.bin
xxd add.bin00000000: f30f 1efa 8d04 37c3 ......7.
f3 0f 1e fa endbr64 ; Intel CET landing pad marker
8d 04 37 lea eax, [rdi+rsi]
c3 ret
The endbr64 instruction marks valid indirect branch targets for Intel’s Control-flow Enforcement Technology (CET). It’s a 4-byte NOP on older CPUs but enables hardware-enforced CFI on newer ones. The -fcf-protection=none flag disables this.
Assembly is a Text Representation
Assembly language is a human-readable encoding of machine code. Each mnemonic corresponds to specific machine instruction bytes according to the CPU vendor’s specification.
echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -S -x c - -o add.s
cat add.s .file ""
.text
.globl add
.type add, @function
add:
leal (%rdi,%rsi), %eax
ret
.size add, .-addThe .globl, .type, and .size directives are metadata for the assembler—they control symbol visibility and don’t generate machine code. The actual instructions are just leal and ret.
Important
Assembly syntax isn’t standardized. GCC emits AT&T syntax (
leal (%rdi,%rsi), %eax) by default. The same instruction in Intel syntax islea eax, [rdi+rsi]. The output bytes are identical—this is purely tooling preference.
echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -S -masm=intel -x c - -o add.s
cat add.s .intel_syntax noprefix
.file ""
.text
.globl add
.type add, @function
add:
lea eax, [rdi+rsi]
ret
.size add, .-addSame function, same bytes, different text representation.
Little-Endian Encoding
x86 stores multi-byte values with the least significant byte first. This affects how you read instruction encodings and memory dumps:
32-bit value 0x00000001 in memory:
Address: 0 1 2 3
Bytes: 01 00 00 00
↑ ↑
LSB MSB
The bytes 01 00 00 00 represent the integer 1, not 0x01000000 (16,777,216). This becomes important when reading immediate values in instructions:
mov $0x4,%esi → be 04 00 00 00
↑ ↑─────────↑
opcode imm32 = 0x00000004 (little-endian)
When disassemblers show mov $0x4,%esi, they’ve already converted the little-endian bytes to human-readable form.
Object Files Bundle Code with Metadata
The assembler could output raw machine code, but this creates problems. When assembling call printf, the assembler doesn’t know where printf will be in memory. It can’t emit a concrete address. Instead, it emits a placeholder and records “this location needs the address of printf.”
Object files solve this by bundling machine code with metadata:
echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -c -x c - -o add.o
ls -l add.o add.bin-rw-r--r-- 1 user user 4 Jan 18 10:00 add.bin
-rw-r--r-- 1 user user 1368 Jan 18 10:00 add.o
4 bytes of machine code became 1368 bytes of object file. The rest is ELF structure:
readelf -SW add.oSection Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .text PROGBITS 0000000000000000 000040 000004 00 AX 0 0 1
[ 2] .data PROGBITS 0000000000000000 000044 000000 00 WA 0 0 1
[ 3] .bss NOBITS 0000000000000000 000044 000000 00 WA 0 0 1
[ 4] .comment PROGBITS 0000000000000000 000044 00001f 01 MS 0 0 1
[ 5] .note.GNU-stack PROGBITS 0000000000000000 000063 000000 00 0 0 1
[ 6] .eh_frame PROGBITS 0000000000000000 000068 000030 00 A 0 0 8
[ 7] .rela.eh_frame RELA 0000000000000000 000250 000018 18 I 9 6 8
[ 8] .symtab SYMTAB 0000000000000000 000098 0000a8 18 9 4 8
[ 9] .strtab STRTAB 0000000000000000 000140 00000d 00 0 0 1
[10] .shstrtab STRTAB 0000000000000000 000268 00005c 00 0 0 1
The .text section is 4 bytes—that’s our code. Everything else is metadata: symbol tables, string tables, exception handling info, section headers.
Relocations Mark Unresolved References
When code references external symbols, the assembler can’t fill in addresses. It emits placeholders and records relocations:
cat > caller.c << 'EOF'
int add(int a, int b);
int compute(void) {
return add(3, 4);
}
EOF
gcc -O2 -fcf-protection=none -c caller.c -o caller.o
objdump -dr caller.o0000000000000000 <compute>:
0: be 04 00 00 00 mov $0x4,%esi
5: bf 03 00 00 00 mov $0x3,%edi
a: e9 00 00 00 00 jmp f <compute+0xf>
b: R_X86_64_PLT32 add-0x4
The e9 00 00 00 00 is a jmp with a placeholder offset (all zeros). The relocation R_X86_64_PLT32 add-0x4 tells the linker: “at offset 0xb, insert the relative address of add, minus 4 (because x86 relative jumps are calculated from the end of the instruction).”
The Linker Resolves Symbols
The linker combines object files, resolves symbols, and processes relocations:
gcc -O2 -fcf-protection=none -nostartfiles -e compute -o program caller.o add.o
objdump -d program0000000000001000 <compute>:
1000: be 04 00 00 00 mov $0x4,%esi
1005: bf 03 00 00 00 mov $0x3,%edi
100a: e9 01 00 00 00 jmp 1010 <add>
100f: 90 nop
0000000000001010 <add>:
1010: 8d 04 37 lea (%rdi,%rsi,1),%eax
1013: c3 ret
Why 0x1000? The linker decides where to place code in memory. On Linux, the default linker script puts .text at 0x1000 (4096 bytes). The first page is intentionally unmapped so null pointer dereferences crash. This is convention, not hardware requirement—you can override it with -Wl,-Ttext=0x5000.
The jmp doesn’t contain address 0x1010. The placeholder 00 00 00 00 became 01 00 00 00. Decoding this:
e9 opcode for "jmp rel32"
01 00 00 00 offset in little-endian = 1
x86 relative jumps calculate the target from the end of the instruction:
jmp starts at: 0x100a
jmp is 5 bytes: 0x100a + 5 = 0x100f
offset: 1
target: 0x100f + 1 = 0x1010
The <add> in the disassembly is a convenience—objdump looked up 0x1010 in the symbol table and printed the associated name. The machine code contains only the numeric offset 01 00 00 00.
The nop at 0x100f is alignment padding inserted by the linker. Functions are often aligned to 16-byte boundaries for performance.
Labels and Symbols
A label in assembly source becomes a symbol in the object file. They’re the same concept at different stages:
| Stage | What “add” Is |
|---|---|
| Assembly source | A label you write: add: |
| Object file | A symbol table entry: “add = offset 0 in .text” |
| After linking | Updated entry: “add = 0x1010” |
| Machine code | Nothing. The CPU sees only bytes and addresses. |
You can inspect the symbol table:
nm program0000000000001010 T add
0000000000001000 T compute
The T means “text section” (code). These names exist in metadata, not in the instruction stream. If you strip the binary:
strip program
nm programnm: program: no symbols
The program still runs—execution doesn’t need symbol names, only addresses. The symbol table exists for debuggers, profilers, and disassemblers. After stripping:
objdump -d program | grep -A1 "100a:" 100a: e9 01 00 00 00 jmp 1010
Same bytes, but no <add> annotation because the symbol table is gone.
Section Conventions
Sections categorize output. The names are conventions enforced by linker scripts, not CPU requirements:
section .text ; Convention: executable code
section .data ; Convention: initialized read-write data
section .rodata ; Convention: read-only data (constants)
section .bss ; Convention: zero-initialized (no file space)The linker and loader interpret sections based on flags, not names:
readelf -SW program | grep -E '\.text' [ 1] .text PROGBITS 0000000000001000 001000 000014 00 AX 0 0 1
The AX flags mean Allocated (loaded into memory) and eXecutable. You could name your code section .banana and it would work if flagged correctly—but linker scripts expect standard names.
Why Multiple Executable Formats Exist
Different operating systems evolved independently, each designing object formats before standardization pressure existed:
| Format | Year | Origin | Used By |
|---|---|---|---|
| a.out | 1975 | Original Unix | Abandoned |
| ELF | 1988 | System V R4 | Linux, BSD, Solaris |
| Mach-O | 1989 | NeXTSTEP | macOS, iOS |
| PE | 1993 | Windows NT | Windows |
Why not standardize now? Switching costs are enormous—every compiler, linker, debugger, and OS loader would need changes. The formats also encode genuine design differences: PE has resource embedding, Mach-O has fat binaries and mandatory code signing, ELF has lazy binding. These reflect different OS philosophies that can’t be trivially unified.
What’s Actually Portable
| Layer | Standardized By | Portable? |
|---|---|---|
| Machine code encoding | CPU vendor (Intel, ARM) | Within same architecture |
| Assembly mnemonics | CPU vendor | Within same architecture |
| Assembly syntax | Toolchain (GNU, LLVM) | No, but semantically equivalent |
| Object file format | OS convention | No |
| System call numbers | OS kernel | No |
| Calling convention | OS + architecture ABI | No |
Cross-compilation works because compilers can emit machine code for any supported CPU and wrap it in any supported object format. The compiler doesn’t care what OS it runs on—it cares what OS the output targets.
System Calls: Where OS Dependence Gets Real
Machine code can only do what the CPU supports: arithmetic, memory access, control flow. Everything else—file I/O, networking, process creation—requires asking the kernel via system calls.
; Linux: write(1, message, 13)
mov rax, 1 ; syscall number (LINUX-SPECIFIC)
mov rdi, 1 ; fd = stdout
lea rsi, [rel msg] ; buffer
mov rdx, 13 ; length
syscall ; invoke kernel
; macOS: write(1, message, 13)
mov rax, 0x2000004 ; syscall number (MACOS-SPECIFIC)
mov rdi, 1
lea rsi, [rel msg]
mov rdx, 13
syscallThe syscall instruction is identical. The syscall numbers differ entirely. This is why a Linux binary can’t run on macOS despite identical CPUs—the kernel interface is incompatible.
libc abstracts this. Calling write() in C invokes a wrapper that uses the correct syscall number for your OS. “Portable C” means the abstraction is in the library, not the binary format.
Practical Exploration
Trace the full pipeline on a single function:
# Source → Assembly
echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -S -x c - -o add.s
cat add.s
# Assembly → Object file
gcc -c add.s -o add.o
# See symbol table in object file
nm add.o
# Extract raw machine code
objcopy -O binary -j .text add.o add.bin
xxd add.bin
# Compare sizes
ls -l add.o add.bin
# See ELF structure
readelf -SW add.oFor code with external references, relocations become visible:
cat > example.c << 'EOF'
#include <stdio.h>
int main(void) {
puts("Hello");
return 0;
}
EOF
gcc -O2 -fcf-protection=none -c example.c -o example.o
objdump -dr example.oThe output shows relocations with placeholder bytes that the linker will patch. After linking, those zeros become real offsets pointing to resolved addresses.