When you write code in a compiled language, several transformations occur before anything executes. Each layer has different concerns and different “owners.”

┌─────────────────────────────────────────────────────────────────┐
│  Source Code              (C, Rust, Go, etc.)                   │
│  ↓                                                              │
│  Compiler                 Your code → machine instructions      │
│  ↓                                                              │
│  Assembly                 Human-readable machine instructions   │
│  ↓                                                              │
│  Assembler                Assembly text → raw bytes             │
│  ↓                                                              │
│  Object File              Machine code + metadata (relocatable) │
│  ↓                                                              │
│  Linker                   Combines objects, resolves symbols    │
│  ↓                                                              │
│  Executable               Machine code + OS loading metadata    │
│  ↓                                                              │
│  Loader (OS)              Maps into memory, patches addresses   │
│  ↓                                                              │
│  CPU                      Executes raw instruction bytes        │
└─────────────────────────────────────────────────────────────────┘

The critical insight: machine code is standardized by CPU vendors, but everything around it is convention. The x86-64 instruction encoding is identical whether you’re on Linux, macOS, or Windows. What differs is the container format that wraps the machine code and tells the OS how to load it.

Machine Code is Just Bytes

At the bottom, CPUs execute bytes. The Intel/AMD x86-64 manual defines exactly what each byte sequence means. This is non-negotiable hardware behavior.

echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -c -x c - -o add.o
objcopy -O binary -j .text add.o add.bin
xxd add.bin
00000000: 8d04 37c3                                ..7.

Four bytes. That’s a complete function. Decoding it:

8d 04 37      lea eax, [rdi+rsi]    ; eax = a + b
c3            ret                    ; return to caller

The lea (load effective address) instruction computes an address but stores the result instead of dereferencing. The compiler uses it as a three-operand add: eax = rdi + rsi. The first argument arrives in rdi, the second in rsi (System V AMD64 calling convention), and the return value goes in eax.

These bytes mean the same thing on every x86-64 system regardless of OS. The CPU doesn’t know what an “executable format” is—it just reads bytes from the instruction pointer and does what they say.

The add.bin file contains nothing but instruction bytes. No headers, no metadata. You can’t execute it directly because the OS doesn’t know where to load it or how to start it—but the bytes themselves are valid machine code.

Optimization Changes the Bytes

Without optimization, the same function compiles to something much larger:

echo 'int add(int a, int b) { return a + b; }' | gcc -O0 -fcf-protection=none -c -x c - -o add.o
objcopy -O binary -j .text add.o add.bin
xxd add.bin
00000000: 5548 89e5 897d fc89 75f8 8b55 fc8b 45f8  UH...}..u..U..E.
00000010: 01d0 5dc3                                ..].

20 bytes for the same two-integer addition:

55               push rbp              ; save frame pointer
48 89 e5         mov rbp, rsp          ; set up stack frame
89 7d fc         mov [rbp-4], edi      ; spill arg 'a' to stack
89 75 f8         mov [rbp-8], esi      ; spill arg 'b' to stack
8b 55 fc         mov edx, [rbp-4]      ; reload 'a' into edx
8b 45 f8         mov eax, [rbp-8]      ; reload 'b' into eax
01 d0            add eax, edx          ; eax = a + b
5d               pop rbp               ; restore frame pointer
c3               ret

The unoptimized version spills arguments to memory then immediately reloads them. This supports debugging (variables have predictable stack locations) but wastes cycles. The optimizer eliminates all of this, recognizing that the arguments arrive in registers and can stay there.

Security Features Add Bytes

Modern GCC inserts control-flow integrity markers by default:

echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -c -x c - -o add.o
objcopy -O binary -j .text add.o add.bin
xxd add.bin
00000000: f30f 1efa 8d04 37c3                      ......7.
f3 0f 1e fa      endbr64               ; Intel CET landing pad marker
8d 04 37         lea eax, [rdi+rsi]
c3               ret

The endbr64 instruction marks valid indirect branch targets for Intel’s Control-flow Enforcement Technology (CET). It’s a 4-byte NOP on older CPUs but enables hardware-enforced CFI on newer ones. The -fcf-protection=none flag disables this.

Assembly is a Text Representation

Assembly language is a human-readable encoding of machine code. Each mnemonic corresponds to specific machine instruction bytes according to the CPU vendor’s specification.

echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -S -x c - -o add.s
cat add.s
        .file   ""
        .text
        .globl  add
        .type   add, @function
add:
        leal    (%rdi,%rsi), %eax
        ret
        .size   add, .-add

The .globl, .type, and .size directives are metadata for the assembler—they control symbol visibility and don’t generate machine code. The actual instructions are just leal and ret.

Important

Assembly syntax isn’t standardized. GCC emits AT&T syntax (leal (%rdi,%rsi), %eax) by default. The same instruction in Intel syntax is lea eax, [rdi+rsi]. The output bytes are identical—this is purely tooling preference.

echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -S -masm=intel -x c - -o add.s
cat add.s
        .intel_syntax noprefix
        .file   ""
        .text
        .globl  add
        .type   add, @function
add:
        lea     eax, [rdi+rsi]
        ret
        .size   add, .-add

Same function, same bytes, different text representation.

Little-Endian Encoding

x86 stores multi-byte values with the least significant byte first. This affects how you read instruction encodings and memory dumps:

32-bit value 0x00000001 in memory:

Address:    0     1     2     3
Bytes:     01    00    00    00
           ↑                 ↑
          LSB               MSB

The bytes 01 00 00 00 represent the integer 1, not 0x01000000 (16,777,216). This becomes important when reading immediate values in instructions:

mov $0x4,%esi  →  be 04 00 00 00
                  ↑  ↑─────────↑
               opcode  imm32 = 0x00000004 (little-endian)

When disassemblers show mov $0x4,%esi, they’ve already converted the little-endian bytes to human-readable form.

Object Files Bundle Code with Metadata

The assembler could output raw machine code, but this creates problems. When assembling call printf, the assembler doesn’t know where printf will be in memory. It can’t emit a concrete address. Instead, it emits a placeholder and records “this location needs the address of printf.”

Object files solve this by bundling machine code with metadata:

echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -c -x c - -o add.o
ls -l add.o add.bin
-rw-r--r-- 1 user user    4 Jan 18 10:00 add.bin
-rw-r--r-- 1 user user 1368 Jan 18 10:00 add.o

4 bytes of machine code became 1368 bytes of object file. The rest is ELF structure:

readelf -SW add.o
Section Headers:
  [Nr] Name              Type            Address          Off    Size   ES Flg Lk Inf Al
  [ 0]                   NULL            0000000000000000 000000 000000 00      0   0  0
  [ 1] .text             PROGBITS        0000000000000000 000040 000004 00  AX  0   0  1
  [ 2] .data             PROGBITS        0000000000000000 000044 000000 00  WA  0   0  1
  [ 3] .bss              NOBITS          0000000000000000 000044 000000 00  WA  0   0  1
  [ 4] .comment          PROGBITS        0000000000000000 000044 00001f 01  MS  0   0  1
  [ 5] .note.GNU-stack   PROGBITS        0000000000000000 000063 000000 00      0   0  1
  [ 6] .eh_frame         PROGBITS        0000000000000000 000068 000030 00   A  0   0  8
  [ 7] .rela.eh_frame    RELA            0000000000000000 000250 000018 18   I  9   6  8
  [ 8] .symtab           SYMTAB          0000000000000000 000098 0000a8 18      9   4  8
  [ 9] .strtab           STRTAB          0000000000000000 000140 00000d 00      0   0  1
  [10] .shstrtab         STRTAB          0000000000000000 000268 00005c 00      0   0  1

The .text section is 4 bytes—that’s our code. Everything else is metadata: symbol tables, string tables, exception handling info, section headers.

Relocations Mark Unresolved References

When code references external symbols, the assembler can’t fill in addresses. It emits placeholders and records relocations:

cat > caller.c << 'EOF'
int add(int a, int b);
 
int compute(void) {
    return add(3, 4);
}
EOF
 
gcc -O2 -fcf-protection=none -c caller.c -o caller.o
objdump -dr caller.o
0000000000000000 <compute>:
   0:   be 04 00 00 00          mov    $0x4,%esi
   5:   bf 03 00 00 00          mov    $0x3,%edi
   a:   e9 00 00 00 00          jmp    f <compute+0xf>
                        b: R_X86_64_PLT32       add-0x4

The e9 00 00 00 00 is a jmp with a placeholder offset (all zeros). The relocation R_X86_64_PLT32 add-0x4 tells the linker: “at offset 0xb, insert the relative address of add, minus 4 (because x86 relative jumps are calculated from the end of the instruction).”

The Linker Resolves Symbols

The linker combines object files, resolves symbols, and processes relocations:

gcc -O2 -fcf-protection=none -nostartfiles -e compute -o program caller.o add.o
objdump -d program
0000000000001000 <compute>:
    1000:	be 04 00 00 00       	mov    $0x4,%esi
    1005:	bf 03 00 00 00       	mov    $0x3,%edi
    100a:	e9 01 00 00 00       	jmp    1010 <add>
    100f:	90                   	nop

0000000000001010 <add>:
    1010:	8d 04 37             	lea    (%rdi,%rsi,1),%eax
    1013:	c3                   	ret

Why 0x1000? The linker decides where to place code in memory. On Linux, the default linker script puts .text at 0x1000 (4096 bytes). The first page is intentionally unmapped so null pointer dereferences crash. This is convention, not hardware requirement—you can override it with -Wl,-Ttext=0x5000.

The jmp doesn’t contain address 0x1010. The placeholder 00 00 00 00 became 01 00 00 00. Decoding this:

e9                 opcode for "jmp rel32"
01 00 00 00        offset in little-endian = 1

x86 relative jumps calculate the target from the end of the instruction:

jmp starts at:     0x100a
jmp is 5 bytes:    0x100a + 5 = 0x100f
offset:            1
target:            0x100f + 1 = 0x1010

The <add> in the disassembly is a convenience—objdump looked up 0x1010 in the symbol table and printed the associated name. The machine code contains only the numeric offset 01 00 00 00.

The nop at 0x100f is alignment padding inserted by the linker. Functions are often aligned to 16-byte boundaries for performance.

Labels and Symbols

A label in assembly source becomes a symbol in the object file. They’re the same concept at different stages:

StageWhat “add” Is
Assembly sourceA label you write: add:
Object fileA symbol table entry: “add = offset 0 in .text”
After linkingUpdated entry: “add = 0x1010”
Machine codeNothing. The CPU sees only bytes and addresses.

You can inspect the symbol table:

nm program
0000000000001010 T add
0000000000001000 T compute

The T means “text section” (code). These names exist in metadata, not in the instruction stream. If you strip the binary:

strip program
nm program
nm: program: no symbols

The program still runs—execution doesn’t need symbol names, only addresses. The symbol table exists for debuggers, profilers, and disassemblers. After stripping:

objdump -d program | grep -A1 "100a:"
    100a:	e9 01 00 00 00       	jmp    1010

Same bytes, but no <add> annotation because the symbol table is gone.

Section Conventions

Sections categorize output. The names are conventions enforced by linker scripts, not CPU requirements:

section .text       ; Convention: executable code
section .data       ; Convention: initialized read-write data
section .rodata     ; Convention: read-only data (constants)
section .bss        ; Convention: zero-initialized (no file space)

The linker and loader interpret sections based on flags, not names:

readelf -SW program | grep -E '\.text'
  [ 1] .text  PROGBITS  0000000000001000  001000  000014  00  AX  0  0  1

The AX flags mean Allocated (loaded into memory) and eXecutable. You could name your code section .banana and it would work if flagged correctly—but linker scripts expect standard names.

Why Multiple Executable Formats Exist

Different operating systems evolved independently, each designing object formats before standardization pressure existed:

FormatYearOriginUsed By
a.out1975Original UnixAbandoned
ELF1988System V R4Linux, BSD, Solaris
Mach-O1989NeXTSTEPmacOS, iOS
PE1993Windows NTWindows

Why not standardize now? Switching costs are enormous—every compiler, linker, debugger, and OS loader would need changes. The formats also encode genuine design differences: PE has resource embedding, Mach-O has fat binaries and mandatory code signing, ELF has lazy binding. These reflect different OS philosophies that can’t be trivially unified.

What’s Actually Portable

LayerStandardized ByPortable?
Machine code encodingCPU vendor (Intel, ARM)Within same architecture
Assembly mnemonicsCPU vendorWithin same architecture
Assembly syntaxToolchain (GNU, LLVM)No, but semantically equivalent
Object file formatOS conventionNo
System call numbersOS kernelNo
Calling conventionOS + architecture ABINo

Cross-compilation works because compilers can emit machine code for any supported CPU and wrap it in any supported object format. The compiler doesn’t care what OS it runs on—it cares what OS the output targets.

System Calls: Where OS Dependence Gets Real

Machine code can only do what the CPU supports: arithmetic, memory access, control flow. Everything else—file I/O, networking, process creation—requires asking the kernel via system calls.

; Linux: write(1, message, 13)
mov rax, 1          ; syscall number (LINUX-SPECIFIC)
mov rdi, 1          ; fd = stdout
lea rsi, [rel msg]  ; buffer
mov rdx, 13         ; length
syscall             ; invoke kernel
 
; macOS: write(1, message, 13)  
mov rax, 0x2000004  ; syscall number (MACOS-SPECIFIC)
mov rdi, 1
lea rsi, [rel msg]
mov rdx, 13
syscall

The syscall instruction is identical. The syscall numbers differ entirely. This is why a Linux binary can’t run on macOS despite identical CPUs—the kernel interface is incompatible.

libc abstracts this. Calling write() in C invokes a wrapper that uses the correct syscall number for your OS. “Portable C” means the abstraction is in the library, not the binary format.

Practical Exploration

Trace the full pipeline on a single function:

# Source → Assembly
echo 'int add(int a, int b) { return a + b; }' | gcc -O2 -fcf-protection=none -S -x c - -o add.s
cat add.s
 
# Assembly → Object file
gcc -c add.s -o add.o
 
# See symbol table in object file
nm add.o
 
# Extract raw machine code
objcopy -O binary -j .text add.o add.bin
xxd add.bin
 
# Compare sizes
ls -l add.o add.bin
 
# See ELF structure
readelf -SW add.o

For code with external references, relocations become visible:

cat > example.c << 'EOF'
#include <stdio.h>
int main(void) {
puts("Hello"); 
    return 0;
}
EOF
 
gcc -O2 -fcf-protection=none -c example.c -o example.o
objdump -dr example.o

The output shows relocations with placeholder bytes that the linker will patch. After linking, those zeros become real offsets pointing to resolved addresses.