Yes, this note has similar issues. The tools (size, readelf, nm, bloaty, cargo-bloat) are all at H2 level but they’re conceptually a group. Here’s a restructured version:


Why Binaries Are Larger Than Expected

Compiled binaries contain far more than your source code. The linker bundles in exception handling tables, DWARF debug information, symbol tables, relocation entries, and string tables. In languages with generics like Rust, C++, or Swift, monomorphization duplicates function bodies for each concrete type instantiation. A single Vec<T> used with 5 different types produces 5 separate compiled versions of every method.

The panic and formatting machinery in Rust deserves special attention. Using panic!("{}", x) or format!() pulls in std::fmt, which includes vtables and trait object infrastructure for the Display, Debug, and Formatter types. This can add 20-50KB to small binaries. The unwrap() method on Option and Result includes panic formatting for the error message, which is why replacing unwrap() with unwrap_unchecked() (in unsafe contexts) or expect() with a static string can reduce size.

ELF Structure Fundamentals

ELF organizes data into sections (used at link time) and segments (used at load time). The operating system’s loader only cares about segments, but analysis tools primarily work with sections because they contain semantic information.

┌─────────────────┐
│   ELF Header    │  ← Magic, class (32/64), endianness, entry point
├─────────────────┤
│ Program Headers │  ← Describes segments for the loader
├─────────────────┤
│     .text       │  ← Executable code
│     .rodata     │  ← Read-only data (string literals, constants)
│     .data       │  ← Initialized mutable globals
│     .bss        │  ← Zero-initialized data (doesn't occupy file space)
│     .symtab     │  ← Symbol table (can be stripped)
│     .strtab     │  ← String table for symbol names
│     .debug_*    │  ← DWARF debug info sections
│     .eh_frame   │  ← Exception/unwind tables
├─────────────────┤
│ Section Headers │  ← Describes sections for linkers/tools
└─────────────────┘

The .bss section is a common source of confusion when analyzing sizes. It stores zero-initialized static variables but occupies zero bytes in the file—only the size metadata is recorded. At load time, the OS allocates and zeros this memory. Tools report .bss size but it doesn’t contribute to file size.

Standard Unix Tools

These tools ship with binutils on Linux and are available on most Unix systems. They provide quick insights without additional installation.

size: Quick Overview

The size utility provides the fastest overview of a binary’s composition.

size ./target/release/myapp
   text    data     bss     dec     hex filename
 847231   15472    1096  863799   d2e27 myapp

The text column aggregates all executable sections (.text, .rodata, .eh_frame). The data column covers initialized writable sections. The dec column is the sum, representing runtime memory footprint but not file size.

For actual file size breakdown, use the -A flag which shows individual sections:

size -A ./target/release/myapp
section              size      addr
.interp                28   4194872
.note.gnu.build-id     36   4194900
.gnu.hash            1256   4194944
.dynsym              3888   4196200
.dynstr              2174   4200088
.text              641892   4210688
.rodata            143628   4852584
.eh_frame_hdr       15284   4996212
.eh_frame           78952   5011496
.data.rel.ro         9872   5296288
.got                  592   5306376
.data                 568   5307136
.bss                 1096   5307712
Total             1052182

The .eh_frame section often surprises developers. It contains DWARF Call Frame Information used for stack unwinding during panics and exceptions. In Rust release builds with panic = "abort", this section shrinks dramatically since unwinding is disabled.

readelf: ELF Metadata

The readelf command exposes the full ELF structure without interpretation. It’s essential for understanding what the linker produced.

Examining Headers

readelf -h ./myapp
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  Type:                              DYN (Position-Independent Executable)
  Machine:                           Advanced Micro Devices X86-64
  Entry point address:               0x4052e0

The Type field distinguishes between EXEC (fixed-address executable), DYN (shared library or position-independent executable), and REL (relocatable object file). Modern compilers default to DYN for security through ASLR.

Section Details

readelf -S ./myapp

This reveals section flags which explain their runtime behavior:

FlagMeaning
AAllocated in memory at runtime
XExecutable
WWritable
SContains null-terminated strings
MMergeable (linker can deduplicate)

Sections without the A flag don’t contribute to runtime memory. Debug sections (.debug_info, .debug_abbrev, .debug_line, etc.) lack this flag—they’re only read by debuggers.

Dynamic Dependencies

readelf -d ./myapp | grep NEEDED
 0x0000000000000001 (NEEDED)  Shared library: [libssl.so.3]
 0x0000000000000001 (NEEDED)  Shared library: [libcrypto.so.3]
 0x0000000000000001 (NEEDED)  Shared library: [libc.so.6]

This shows what the binary expects at runtime. Missing entries here when you expected dynamic linking often indicates static linking pulled in more code than anticipated.

nm: Symbol Analysis

The nm command lists symbols from the symbol table, revealing function and variable names with their sizes. Combined with sorting, it identifies the largest contributors.

nm -C --size-sort --radix=d ./myapp | tail -30

The -C flag demangles C++ and Rust symbols. The --radix=d flag displays sizes in decimal for easier reading. Sorting by size and taking the tail shows the largest symbols.

00000000000003456 T regex::exec::ExecNoSync<'c>::find_literals
00000000000004102 T core::fmt::Formatter::pad_integral
00000000000008934 T std::panicking::rust_panic_with_hook
00000000000012847 T alloc::raw_vec::RawVec<T,A>::grow_amortized

Symbol types in the second column indicate location:

TypeMeaning
T/tText (code) section, uppercase = global
D/dInitialized data section
B/bBSS (uninitialized data)
R/rRead-only data
UUndefined (external reference)
W/wWeak symbol

Finding Monomorphization Bloat

Monomorphized generic functions share a common prefix. Counting instantiations reveals problematic generics:

nm -C ./myapp | grep -oP '^\S+ \S+ \K[^<]+' | sort | uniq -c | sort -rn | head -20

When you see alloc::raw_vec::RawVec appearing 47 times, that’s 47 different type instantiations of the vector growth logic.

Warning

Stripped binaries (strip ./myapp) remove the .symtab section entirely. The nm command will show nothing. Use readelf -S to check if .symtab exists before relying on nm.

Dedicated Size Analysis Tools

These tools are purpose-built for binary size analysis and provide deeper insights than standard Unix utilities.

Bloaty McBloatface

Bloaty is Google’s tool for hierarchical binary size analysis. Unlike nm, it attributes all bytes in the file including headers, padding, and debug info. It supports ELF, Mach-O, and WebAssembly.

# Ubuntu/Debian
apt install bloaty
 
# macOS
brew install bloaty
 
# From source (for latest features)
git clone https://github.com/google/bloaty.git
cd bloaty && cmake -B build && cmake --build build

Data Sources

Bloaty aggregates data from multiple sources specified with -d:

SourceDescription
sectionsELF sections
segmentsELF program headers
symbolsSymbol table entries
compileunitsDWARF compilation units (source files)
inlinesInlined function attribution
shortsymbolsSymbols without template parameters

Basic Usage

bloaty ./myapp -d sections
    FILE SIZE        VM SIZE
 --------------  --------------
  43.2%   892Ki   0.0%       0    .debug_info
  18.7%   386Ki  47.1%   386Ki    .text
   8.4%   173Ki   0.0%       0    .debug_str
   7.2%   149Ki  18.2%   149Ki    .rodata
   6.1%   126Ki   0.0%       0    .debug_line
   4.8%  98.4Ki   0.0%       0    .debug_abbrev
   3.8%  78.4Ki   9.6%  78.4Ki    .eh_frame

The dual columns distinguish file size (what’s on disk) from VM size (runtime memory). Debug sections show 0% VM size because they’re not loaded.

Hierarchical Drill-Down

Combining data sources reveals attribution at multiple levels:

bloaty ./myapp -d compileunits,symbols -n 15

This shows which source files contribute most, then breaks down to individual symbols within each. The -n 15 limits to top 15 entries per level.

Comparing Builds

Bloaty’s diff mode is invaluable for tracking size regressions:

bloaty ./myapp_new -- ./myapp_old
     VM SIZE                     FILE SIZE
 ++++++++++++++ GROWING       ++++++++++++++
  [ = ]       0 .debug_info    +12.3Ki  +1.4%
  +2.1Ki  +0.5% .text          +2.1Ki  +0.5%
  +892        .rodata          +892     +0.6%

 -------------- SHRINKING     --------------
  -1.2Ki -0.2% .eh_frame       -1.2Ki  -0.2%

Tip

For tracking size over time in CI, use bloaty --csv to output machine-readable data that can be graphed or stored.

Understanding Attribution Gaps

Bloaty shows [section .text] or [unmapped] when it cannot attribute bytes to specific symbols. This happens when debug information is incomplete or stripped. High unmapped percentages indicate you need a debug build for accurate analysis:

# Build with debug info but optimized
RUSTFLAGS="-C debuginfo=2" cargo build --release
bloaty ./target/release/myapp -d symbols --debug-file=./target/release/myapp

Cargo Bloat (Rust)

The cargo-bloat tool understands Rust’s compilation model and provides crate-level attribution that generic tools cannot.

cargo install cargo-bloat

Basic Usage

cargo bloat --release -n 20
 File  .text     Size       Crate Name
 1.2%   8.4%  32.5KiB       regex regex::exec::ExecNoSync<'c>::find
 0.8%   5.6%  21.7KiB         std std::panicking::rust_panic_with_hook
 0.6%   4.2%  16.3KiB       serde serde_json::de::Deserializer<R>::parse
 0.5%   3.8%  14.7KiB         std core::fmt::Formatter::pad_integral

The File percentage is relative to total file size. The .text percentage is relative to the code section only. Large file percentages with small .text percentages indicate non-code bloat (debug info, rodata).

Crate-Level Analysis

cargo bloat --release --crates
 File  .text     Size Crate
14.2%  62.1% 240.3KiB std
 5.8%  25.4%  98.2KiB regex
 1.4%   6.1%  23.6KiB serde_json
 0.8%   3.5%  13.5KiB tokio
 0.5%   2.2%   8.5KiB [Unknown]

The [Unknown] category contains symbols that cargo-bloat couldn’t attribute to a crate. This often includes LLVM-generated intrinsics and linker-synthesized code.

Filtering and Splitting

# Only show functions from a specific crate
cargo bloat --release --filter regex -n 30
 
# Separate std from your code
cargo bloat --release --split-std
 
# Show generic functions grouped by base name
cargo bloat --release --message-format=short-json | jq '.functions | group_by(.name | split("<")[0])'

Time-Based Analysis

For understanding compile-time impact alongside size:

cargo bloat --release --time -j1

This shows compilation time per crate with -j1 ensuring serial compilation for accurate timing. Crates that are both slow to compile and large are prime optimization targets.

Reducing Binary Size

Cargo Profile Settings

In Cargo.toml:

[profile.release]
lto = true           # Link-time optimization across all crates
codegen-units = 1    # Disable parallel codegen for better optimization
panic = "abort"      # Remove unwinding machinery
strip = true         # Strip symbols from final binary
 
[profile.release.package."*"]
opt-level = "z"      # Optimize dependencies for size

The lto = true setting enables fat LTO which performs whole-program optimization. This allows LLVM to eliminate dead code across crate boundaries and inline functions that were previously opaque. Expect 10-30% size reduction but significantly longer compile times.

Setting codegen-units = 1 forces LLVM to see all code in a single compilation unit, enabling optimizations that aren’t possible with parallel codegen. The default of 16 units prioritizes compile speed over optimization quality.

Measuring Impact

# Baseline
cargo build --release
BEFORE=$(stat -f%z ./target/release/myapp)
 
# After change
cargo build --release
AFTER=$(stat -f%z ./target/release/myapp)
 
echo "Delta: $((AFTER - BEFORE)) bytes"

For systematic tracking, create a script that outputs to CSV:

#!/bin/bash
VERSION=$(git rev-parse --short HEAD)
cargo build --release 2>/dev/null
SIZE=$(stat -f%z ./target/release/myapp)
TEXT=$(size ./target/release/myapp | tail -1 | awk '{print $1}')
echo "$VERSION,$SIZE,$TEXT" >> size_history.csv

Finding Unused Dependencies

cargo install cargo-udeps
cargo +nightly udeps --release

Unused dependencies may still contribute code if they have build.rs scripts or if LTO couldn’t eliminate their initialization code.

Examining Specific Symbols

When cargo-bloat shows a suspiciously large function, examine its assembly:

objdump -d ./target/release/myapp | grep -A 100 'regex::exec::ExecNoSync'

For a cleaner view with Rust demangling:

cargo install cargo-asm
cargo asm --release myapp::some_function