Goblin is a binary format parser. It reads the structure and metadata of executable files — headers, sections, symbols, dependencies, relocations.
It is NOT a disassembler. It doesn’t decode machine instructions
(0x55 0x48 0x89 0xe5 → push rbp; mov rbp, rsp). For that, see Disassemblying.
Think of it this way:
| Layer | Tool | What it reads |
|---|---|---|
| Format/Metadata | Goblin | Headers, sections, symbols, imports |
| Instructions | capstone, iced-x86 | Raw bytes → assembly mnemonics |
| Semantics | Binary ninja, Ghidra | Control flow, decompilation |
Goblin is Layer 0 — you need to understand the container before you can find the code inside it. It answers: where is the code, what symbols exist, how is the binary structured. A disassembler answers: what do the bytes do.
To build a disassembler in Rust, you’d combine:
// Goblin finds the .text section (where code lives)
// capstone-rs decodes the bytes inside it
goblin + capstone-rs // = DIY disassembler
goblin + iced-x86 // = alternative, pure RustELF Crash Course: The Mental Model
ELF (Executable and Linkable Format) is how Linux packages a compiled program. Think of it as a zip file with a very rigid table of contents.
┌──────────────────────────────────────────┐
│ ELF Header │ ← "What am I?" (arch, type, entry point)
├──────────────────────────────────────────┤
│ Program Headers (Segments) │ ← "How to LOAD me into memory"
│ (used at runtime by the kernel) │ (the kernel reads these)
├──────────────────────────────────────────┤
│ │
│ .text (executable code) │
│ .rodata (read-only constants) │
│ .data (initialized globals) │
│ .bss (uninitialized globals) │
│ .symtab (symbol table) │
│ .strtab (string table) │
│ .dynsym (dynamic symbols) │
│ .dynstr (dynamic strings) │
│ .plt (procedure linkage) │
│ .got (global offset table) │
│ .dynamic (dynamic linking info) │
│ ... │
│ │
├──────────────────────────────────────────┤
│ Section Headers │ ← "How to ANALYZE me"
│ (used by tools like readelf/nm) │ (debuggers/linkers read these)
└──────────────────────────────────────────┘
Key distinction:
- Segments (Program Headers) = runtime view. The kernel loads these into memory.
- Sections (Section Headers) = development view. Tools care about these.
A segment can contain multiple sections. The .text section lives inside a
PT_LOAD segment marked executable.
The Jargon Decoder
ELF Header Constants
| Constant | Full Name | Meaning |
|---|---|---|
ET_EXEC | Executable Type | Traditional fixed-address executable |
ET_DYN | Dynamic/Shared | Position-independent (PIE executable OR .so library) |
ET_REL | Relocatable | Object file (.o), not yet linked |
ET_CORE | Core dump | Process memory dump from a crash |
Why ET_DYN matters: Modern executables are compiled as PIE (Position-Independent Executables) so the kernel can load them at a random address (ASLR). PIE binaries show up as
ET_DYN, notET_EXEC. If you seeET_EXEC, ASLR is NOT possible — that’s a security finding.
Program Header Types (Segments)
| Constant | What it is |
|---|---|
PT_LOAD | ”Load this chunk into memory” — the actual code & data |
PT_DYNAMIC | Dynamic linking info (what .so files are needed) |
PT_INTERP | Path to the dynamic linker (/lib64/ld-linux-x86-64.so.2) |
PT_GNU_STACK | Declares whether the stack should be executable |
PT_GNU_RELRO | Marks memory to be made read-only after relocation |
PT_NOTE | Metadata (build ID, ABI tag, etc.) |
Why PT_GNU_STACK matters: If this segment has the execute flag (PF_X), the process gets an executable stack — meaning buffer overflow exploits can inject shellcode directly onto the stack and run it. Modern binaries should NEVER have this set. If you find it, something is very wrong (probably hand-written assembly or ancient build flags).
Why PT_GNU_RELRO matters: After the dynamic linker resolves all symbols, the GOT (Global Offset Table) can be made read-only. This prevents attackers from overwriting function pointers in the GOT. “Full RELRO” =
PT_GNU_RELRO
BIND_NOW(resolve everything upfront, then lock it down).
Dynamic Section Tags
| Constant | What it controls |
|---|---|
DT_NEEDED | ”I need this shared library” (e.g., libc.so.6) |
DT_SONAME | ”My canonical name is…” (for library versioning) |
DT_RPATH | Hardcoded library search path (generally bad practice) |
DT_RUNPATH | Library search path (slightly better than RPATH) |
DT_FLAGS | Bitfield of flags, including… |
DF_BIND_NOW | ”Resolve ALL symbols at load time, not lazily” |
DT_FLAGS_1 | Extended flags (PIE indicator, etc.) |
Why DT_FLAGS / DF_BIND_NOW matters: Lazy binding means function addresses are resolved the first time they’re called. The GOT entries are writable until then — an attack window.
BIND_NOWcloses it: resolve everything at startup, then (with RELRO) make the GOT read-only. The cost is slightly slower startup. The benefit is a much smaller attack surface.
Symbol Binding & Visibility
| Constant | Meaning |
|---|---|
STB_GLOBAL | Visible to all object files during linking |
STB_LOCAL | Only visible within this object file |
STB_WEAK | Like global but can be overridden without error |
STV_HIDDEN | Not exported from shared library (internal use only) |
STV_DEFAULT | Normal visibility, exported |
Why a Library Instead of Coreutils
| Approach | Good for |
|---|---|
readelf -d mybin | Quick look by a human in a terminal |
| Goblin in a Rust program | Automated systems that make decisions |
The moment you need to:
- Analyze at scale (100s-1000s of binaries)
- Make programmatic decisions (pass/fail CI, allow/deny plugin load)
- Produce structured output (JSON reports, database entries, dashboards)
- Compose with other logic (network calls, DB lookups, policy engines)
- Do it cross-platform (parse a Windows PE on Linux, inspect iOS Mach-O on CI)
…shelling out falls apart. Parsing unstable text. Handling locale differences. Spawning processes. Goblin gives typed, structured, zero-copy access.
It’s the difference between grep-ing HTML and using a DOM parser.
Use Cases: Practical and Non-Trivial
1. Container Image Security Auditor
Scan every binary in a Docker image. Check hardening flags. Fail the CI pipeline if anything isn’t locked down.
use goblin::elf::Elf;
use goblin::elf::program_header::{PT_GNU_STACK, PT_GNU_RELRO};
use goblin::elf::dynamic::{DF_BIND_NOW, DT_FLAGS};
struct SecurityReport {
path: String,
is_pie: bool,
has_nx_stack: bool,
has_full_relro: bool,
}
fn audit_elf(path: &str, buffer: &[u8]) -> Option<SecurityReport> {
let elf = Elf::parse(buffer).ok()?;
let is_pie = elf.header.e_type == goblin::elf::header::ET_DYN;
let has_nx_stack = elf.program_headers.iter()
.find(|ph| ph.p_type == PT_GNU_STACK)
.map(|ph| ph.p_flags & 0x1 == 0) // PF_X not set
.unwrap_or(false);
let has_relro = elf.program_headers.iter()
.any(|ph| ph.p_type == PT_GNU_RELRO);
let has_bind_now = elf.dynamic.as_ref()
.map(|d| d.dyns.iter().any(|dyn_entry|
dyn_entry.d_tag == DT_FLAGS
&& (dyn_entry.d_val & DF_BIND_NOW as u64) != 0
))
.unwrap_or(false);
Some(SecurityReport {
path: path.to_string(),
is_pie,
has_nx_stack,
has_full_relro: has_relro && has_bind_now,
})
}Real-world: This is essentially what checksec does, but now it’s part
of your Rust pipeline, producing structured JSON, integrated with your alerting.
2. Binary Diffing Across Releases (ABI Compatibility)
What changed at the symbol level between two versions of a shared library? Catch accidental ABI breaks before they hit production.
fn diff_binaries(old: &[u8], new: &[u8]) {
let old_syms = extract_symbols(old); // HashMap<String, SymbolInfo>
let new_syms = extract_symbols(new);
for (name, _) in &old_syms {
if !new_syms.contains_key(name) {
println!("🔴 REMOVED: {name}"); // ABI break!
}
}
for (name, _) in &new_syms {
if !old_syms.contains_key(name) {
println!("🟢 ADDED: {name}");
}
}
for (name, old_info) in &old_syms {
if let Some(new_info) = new_syms.get(name) {
if old_info.size != new_info.size {
let delta = new_info.size as i64 - old_info.size as i64;
println!("🟡 RESIZED: {name} ({delta:+} bytes)");
}
}
}
}3. Plugin Loader with Pre-Execution Validation
Before dlopen-ing an untrusted .so, inspect it statically:
const REQUIRED_EXPORTS: &[&str] = &[
"plugin_init", "plugin_handle", "plugin_version"
];
const BANNED_IMPORTS: &[&str] = &[
"system", "exec", "popen", "dlopen" // no shell, no loading more libs
];
fn validate_plugin(buffer: &[u8]) -> PluginValidation {
let elf = Elf::parse(buffer).unwrap();
let exported: HashSet<&str> = elf.dynsyms.iter()
.filter(|s| s.is_function() && s.st_bind() == STB_GLOBAL)
.filter_map(|s| elf.dynstrtab.get_at(s.st_name))
.collect();
let suspicious: Vec<&str> = elf.dynsyms.iter()
.filter(|s| s.is_import())
.filter_map(|s| elf.dynstrtab.get_at(s.st_name))
.filter(|name| BANNED_IMPORTS.contains(name))
.collect();
// Does it have .init_array? (runs code automatically on dlopen!)
// See: C Runtime — .init_array constructors
let has_constructor = elf.section_headers.iter()
.any(|sh| matches!(
elf.shdr_strtab.get_at(sh.sh_name),
Some(".init_array" | ".ctors")
));
PluginValidation {
exports_required_api: REQUIRED_EXPORTS.iter()
.all(|r| exported.contains(r)),
suspicious_imports: suspicious,
has_constructor,
}
}Why this matters: .init_array functions run the instant you dlopen.
If a malicious plugin has one, it executes before you can do anything.
Detecting this BEFORE loading is a real security boundary.
4. Shared Library Dependency Graph Auditor
Map every binary in a deployment to its transitive .so dependencies.
Answer: “What still links OpenSSL?” or “Why is libpython in our container?“
fn build_dep_graph(root: &Path) -> HashMap<String, Vec<String>> {
let mut graph = HashMap::new();
for entry in WalkDir::new(root).into_iter().filter_map(Result::ok) {
let buffer = fs::read(entry.path()).unwrap_or_default();
if let Ok(Object::Elf(elf)) = Object::parse(&buffer) {
let libs: Vec<String> = elf.libraries.iter()
.map(|s| s.to_string())
.collect();
if !libs.is_empty() {
graph.insert(entry.path().display().to_string(), libs);
}
}
}
graph
}
// Output this as JSON → feed it into a graph visualizer
// Detect conflicts: two binaries needing different versions of the same .so5. Build a Minimal Disassembler (Goblin + Capstone)
NOW we’re actually disassembling. Goblin finds the code. Capstone decodes it.
use goblin::elf::Elf;
use goblin::elf::section_header::SHT_PROGBITS;
use capstone::prelude::*;
fn disassemble_text_section(buffer: &[u8]) {
let elf = Elf::parse(buffer).unwrap();
// Find .text section
let text_section = elf.section_headers.iter()
.find(|sh| {
elf.shdr_strtab.get_at(sh.sh_name) == Some(".text")
&& sh.sh_type == SHT_PROGBITS
})
.expect("no .text section found");
let offset = text_section.sh_offset as usize;
let size = text_section.sh_size as usize;
let code = &buffer[offset..offset + size];
let base_addr = text_section.sh_addr;
// Disassemble with capstone
let cs = Capstone::new()
.x86()
.mode(arch::x86::ArchMode::Mode64)
.syntax(arch::x86::ArchSyntax::Intel)
.build()
.unwrap();
let instructions = cs.disasm_all(code, base_addr).unwrap();
for insn in instructions.iter() {
println!("{:#010x}: {} {}",
insn.address(),
insn.mnemonic().unwrap_or("???"),
insn.op_str().unwrap_or(""),
);
}
}
// Output:
// 0x00401000: push rbp
// 0x00401001: mov rbp, rsp
// 0x00401004: sub rsp, 0x10
// ...6. Symbol-Aware Disassembly (The Cool Part)
Plain disassembly is just addresses. By combining Goblin’s symbol table with Capstone, you can label functions by name:
fn disassemble_function(buffer: &[u8], func_name: &str) {
let elf = Elf::parse(buffer).unwrap();
// Find the symbol by name
let symbol = elf.syms.iter()
.find(|s| elf.strtab.get_at(s.st_name) == Some(func_name))
.expect("symbol not found");
// Build address → name lookup for cross-references
let sym_map: HashMap<u64, &str> = elf.syms.iter()
.filter_map(|s| {
let name = elf.strtab.get_at(s.st_name)?;
Some((s.st_value, name))
})
.collect();
let offset = symbol.st_value as usize;
let size = symbol.st_size as usize;
// Find which section contains this address to calculate file offset
let section = elf.section_headers.iter()
.find(|sh| {
symbol.st_value >= sh.sh_addr
&& symbol.st_value < sh.sh_addr + sh.sh_size
})
.unwrap();
let file_offset = (symbol.st_value - section.sh_addr + section.sh_offset) as usize;
let code = &buffer[file_offset..file_offset + size];
let cs = Capstone::new()
.x86().mode(arch::x86::ArchMode::Mode64)
.syntax(arch::x86::ArchSyntax::Intel)
.detail(true)
.build().unwrap();
println!("─── {} ({} bytes) ───", func_name, size);
let instructions = cs.disasm_all(code, symbol.st_value).unwrap();
for insn in instructions.iter() {
// If a call/jump target matches a known symbol, annotate it
let annotation = insn.op_str()
.and_then(|ops| {
// crude: parse immediate from "call 0x401234"
let addr = u64::from_str_radix(
ops.trim_start_matches("0x"), 16
).ok()?;
sym_map.get(&addr).map(|name| format!(" ; <{name}>"))
})
.unwrap_or_default();
println!(" {:#010x}: {:6} {}{}",
insn.address(),
insn.mnemonic().unwrap_or(""),
insn.op_str().unwrap_or(""),
annotation,
);
}
}
// Output:
// ─── main (47 bytes) ───
// 0x00401120: push rbp
// 0x00401121: mov rbp, rsp
// 0x00401124: call 0x401200 ; <initialize_server>
// 0x00401129: call 0x401340 ; <run_event_loop>
// ...7. Cross-Platform Binary Inspector (PE + ELF + Mach-O)
Goblin’s killer feature: one API for all formats. Useful if you ship clients for multiple platforms:
use goblin::Object;
fn inspect(buffer: &[u8]) -> String {
match Object::parse(buffer).unwrap() {
Object::Elf(elf) => format!(
"Linux ELF | {} | {} sections | {} symbols | deps: {:?}",
if elf.is_64 { "64-bit" } else { "32-bit" },
elf.section_headers.len(),
elf.syms.len(),
elf.libraries,
),
Object::PE(pe) => format!(
"Windows PE | {} | {} sections | imports: {:?}",
if pe.is_64 { "64-bit" } else { "32-bit" },
pe.sections.len(),
pe.libraries,
),
Object::Mach(mach) => format!("macOS Mach-O | {:?}", mach),
Object::Archive(archive) => format!(
"Static archive (.a) | {} members", archive.members().len()
),
Object::Unknown(magic) => format!("Unknown format (magic: {:#x})", magic),
}
}Ideas to Explore Further
- WASM binary inspector: Goblin doesn’t parse WASM, but
wasmparsercrate follows the same philosophy — could pair them for a universal binary analysis toolkit - Fuzzing harness validator: Before deploying a fuzz target, verify
it’s compiled with sanitizers (check for
__asan_*symbols) - License compliance: Scan binaries for statically linked libraries
by looking for telltale symbols (e.g.,
OPENSSL_*symbols = OpenSSL statically linked = license implications) - Binary size profiler: Map every symbol’s size, sort by largest,
find bloat (Binary Size Analysis). “Why is this binary 50MB?” → “Because
serdemonomorphized 200 versions ofdeserialize” - Core dump analyzer: Parse
ET_COREfiles to extract register state, memory maps, and stack traces programmatically - Reproducible build verifier: Compare two builds of the same source — are the symbols and sections identical? If not, where’s the divergence?
Companion Crates
| Crate | Purpose |
|---|---|
goblin | Parse ELF/PE/Mach-O structure |
capstone-rs | Disassemble x86/ARM/MIPS/etc. |
iced-x86 | Pure Rust x86 disassembler (no C deps) |
object | Alternative to goblin (used by rustc) |
gimli | Parse DWARF debug info |
addr2line | Map addresses to source file:line |
memmap2 | Memory-map files for zero-copy parsing |
wasmparser | Parse WASM binaries |
References
- ELF Specification (PDF)
- Goblin docs.rs
- Linux Foundation - ELF spec
man 5 elf— seriously, this man page is excellent- Capstone Engine
Related
- Executable Binary Formats — the formats (ELF, Mach-O, PE) Goblin parses
- The Abstraction Stack Below Compilation — the full pipeline from source to executable
- Symbol tables — what Goblin reads: names, types, relocations
- Loading a program or a library into memory — runtime counterpart of what Goblin inspects statically
- C Runtime —
_start,.init_arrayconstructors, the execution environment - C Application Binary Interface — calling conventions and ABI contracts in binaries
- Foreign Function Interface — symbol export/import, PLT/GOT, dynamic linking
- Disassemblying — the layer above Goblin (decoding instructions)
- Cargo-show-asm — inspecting compiler output at each pipeline stage
- Low Level Rust —
#[no_mangle],extern "C",#[link_section]that control binary layout - Shellcodes — the offensive side: ASLR, DEP, what Goblin’s security auditor defends against
- Memory page permissions —
PROT_*flags that correspond to segment permissions - Binary Size Analysis — tools and techniques for the binary size profiler idea
- LLVM Sanitizers —
__asan_*symbols for the fuzzing harness validator idea - LLVM — the compiler backend that produces the binaries Goblin parses