Foreign Function Interface

What FFI is (and is not)

A Foreign Function Interface (FFI) is any supported mechanism by which code in one language/runtime calls code produced by another language/toolchain in the same process, or can be called from it. FFI is not a single universal standard; it is a family of language + toolchain features that let you cross a boundary safely.

FFI is not “calling a function in another process.” A normal function call requires jumping to an address in the same virtual address space. Across processes you use IPC/RPC (pipes/sockets/shared memory + protocol), which is conceptually “serialize → send → execute remotely → reply,” not “call by ABI.”

![important] “In-process foreign calls” always imply some ABI contract. If you avoid ABI by moving to another process, you are no longer doing FFI; you are doing RPC.

ISA vs ABI: why binaries on the same architecture don’t automatically interoperate

The ISA defines instructions, registers, and their semantics. It explains what call, ret, mov, add do and how instruction encodings work. It does not specify how independently-compiled code should cooperate at call boundaries.

The ABI is that cooperation contract: it defines where arguments and returns go, which registers must survive a call, stack alignment requirements, how aggregates (structs) are passed/returned, and (at the platform level) how symbols are named/linked/unwound.

A key clarification: add rax, rbx targets rax because you wrote it as the destination operand; that is ISA semantics + assembler syntax, not ABI. The ABI becomes relevant when you cross a function boundary: for example, “integer return value is in RAX” is an ABI convention shared by SysV x86-64 and Windows x64, but it is not dictated by the ISA.

ABI scope: architecture + platform ecosystem (not just architecture)

There is not “one ABI per ISA.” The same ISA can have multiple ABIs. For x86-64, the two dominant ones are a finite set: SysV x86-64 (Linux/*BSD; macOS is closely related) and Windows x64. They disagree on argument registers, stack obligations, red zone vs shadow space, and unwinding metadata conventions.

In practice, when engineers say “FFI-compatible,” they usually mean “compatible with the platform’s C ABI for this target triple,” because that is the most stable cross-language rendezvous point.

A worked ABI failure: valid instructions, invalid contract

ABI bugs often compile and run “sometimes” because every instruction is legal, but the inter-component contract is broken.

On SysV x86-64, RBX is callee-saved. If you write assembly that uses RBX as scratch without saving/restoring, you violate the ABI and can corrupt the caller’s state:

; SysV x86-64
global bad_callee
bad_callee:
    mov rbx, 123        ; clobbers callee-saved register
    ret                 ; caller returns with RBX corrupted

This tends to manifest as “random” corruption far away from the call site because the caller legitimately assumed RBX survived across calls.

Another sharp edge is stack discipline: many ABIs require 16-byte alignment at call boundaries; misalignment may only crash when callees use aligned SIMD moves (e.g., movaps) to/from stack.

On Windows x64, the ABI additionally requires 32 bytes of shadow space reserved by the caller. A caller that omits it can cause stack corruption even if argument registers look correct.

![warning] “Works when I test it” is not evidence your ABI usage is correct. ABI violations frequently become visible only with different optimization levels, different callees, or when exceptions/profilers/unwinders are involved.

Data representation: types become bytes + invariants

Across an FFI boundary, you do not exchange “a string” or “a slice.” You exchange bytes in memory plus an agreement on how to interpret them. The stable core representations are fixed-width scalars, pointers, and explicit length/shape metadata.

The most universal container is “pointer + length.” It is the lingua franca behind Rust slices, Go []byte (subject to cgo rules), Python buffers, and JVM direct buffers:

// canonical byte span
const uint8_t* ptr;
size_t len;

Passing aggregates (structs) by value is where ABI rules get subtle. Whether a struct is passed in registers, split across registers, passed indirectly via a hidden pointer, or returned via a hidden “sret” pointer depends on ABI classification rules and the struct’s shape (size, alignment, field composition). When you need portability, prefer primitives + pointers and keep “rich” structures behind opaque handles.

Ownership and allocation: the allocator boundary is the real boundary

A pointer communicates an address, not ownership. The dominant production rule is:

Memory must be freed by the same allocator/runtime that allocated it.

This matters because “allocator” can mean libc malloc, the Rust global allocator, a Windows CRT heap, a Go-managed heap, or a VM GC. Cross-freeing (allocate in one world, free in another) ranges from immediate crashes to delayed heap corruption.

Two stable designs avoid allocator ambiguity. One is caller-allocated output buffers. The other is callee-allocated memory paired with a callee-provided free function. Opaque handles are the ownership workhorse because they keep internal invariants and allocation policy on one side:

typedef struct Thing Thing;         // opaque: layout not part of the ABI
 
Thing* thing_new(void);
void  thing_free(Thing*);
 
int thing_process(
  Thing*,
  const uint8_t* in_ptr, size_t in_len,
  uint8_t* out_ptr, size_t out_cap,
  size_t* out_len
);

This shape is deliberately “boring”: it avoids struct-by-value ABI pitfalls, makes lifetimes explicit (inputs borrowed for the call), and prevents cross-allocator freeing (output buffer owned by caller).

Errors and unwinding: never let exceptions/panics cross the seam

FFI boundaries must not allow language-specific unwinding to escape. C has no universal exception model, and unwinding across foreign frames is undefined at best.

The robust pattern is that failure crosses as values: return codes and out parameters, optionally augmented with a thread-local “last error message” accessor owned by the library. If you’re exporting from Rust, the boundary should prevent panics from unwinding into foreign code (either by aborting on panic for that surface or catching and translating).

Runtimes: GC, GIL, thread attachment, and why “just pass a pointer” fails

When one side is a runtime-managed environment, there are additional invariants beyond the C ABI.

For Python, entering/leaving Python code has GIL requirements; touching Python objects without the GIL is invalid. For the JVM, native threads must attach before using JNI; references have scoped lifetimes. For Go via cgo, there are strict rules about passing Go pointers to C to keep the GC sound.

These constraints push high-reliability designs toward passing byte spans and handles rather than handing out pointers into runtime-managed objects unless you use explicit pinning/retention APIs.

Callbacks: function pointers plus context, and the missing contracts

Callbacks reverse control flow and force you to specify lifetime and concurrency semantics. A raw function pointer is rarely enough because real callbacks need an environment; the portable representation is “callback + opaque context pointer”:

typedef void (*log_fn)(void* ctx, const uint8_t* msg, size_t len);
void set_logger(log_fn f, void* ctx);

The correctness-critical part is not the typedef; it is the contract: whether callbacks can happen concurrently, what thread they run on, whether they can occur after shutdown, and whether reentrancy is allowed. These must be specified at the API level because the ABI cannot express them.

Exporting a symbol: where “exports” live in real binaries

Exporting a symbol means making a name discoverable to other code at link/load time by emitting symbol metadata with external visibility.

In a relocatable object (.o / .obj), “exported” is primarily encoded as a global symbol table entry pointing at a section+offset (e.g., .text + 0xNN) with appropriate binding/visibility. Undefined references are also symbol table entries, and relocations refer to them. The static linker resolves undefined symbols by searching definitions in other objects/libraries and then applying relocations.

In a shared library, there is an additional concept: the set of symbols visible to the dynamic loader and lookup APIs (dlsym, GetProcAddress). This is platform-specific:

On ELF (Linux), the runtime-visible subset is carried in .dynsym (not the full .symtab), supported by hash structures such as .gnu.hash, and dynamic relocations. .symtab may be stripped in release builds while .dynsym remains for dynamic linking. These symbols are also what dlsym() searches when loading libraries at runtime.

On Windows PE/COFF, a DLL has an explicit export directory (commonly referred to via .edata) listing exported names/ordinals and RVAs. Toolchains often require explicit export annotations or a .def file to populate this table reliably.

On macOS Mach-O, dyld uses its own exported symbol metadata (symbol table + dyld info/export trie). The idea is the same: a loader-optimized “what can I bind to?” structure.

Name mangling is part of this story. C++ and Rust mangle symbol names by default, so “exporting for C callers” generally means forcing C linkage and a stable symbol name.

C++:

extern "C" int add3(int a, int b, int c) { return a+b+c; }

Rust:

#[no_mangle]
pub extern "C" fn add3(a: i32, b: i32, c: i32) -> i32 { a + b + c }

In both cases, the point is to produce a predictable exported symbol name and a C ABI call boundary.

How dynamic linking “finds” your exported symbol (high-level but concrete)

At runtime, a dynamically-linked call typically goes through an indirection that allows late binding. On ELF this is commonly PLT/GOT-mediated; the loader resolves references using relocation entries and .dynsym/hash tables. On Windows, the loader populates an import address table based on the DLL’s export table (or GetProcAddress resolves by name/ordinal). You can think of this as the runtime counterpart of the static linker’s “undefined symbol + relocation → find definition → patch call site,” except some patching can be deferred until first call.

How to verify: the artifact-level inspection loop

When debugging “why can’t I link/call/find this function,” the fastest path is to inspect the actual binary metadata.

On Linux (ELF), you typically answer: “Is the symbol global? Is it in .dynsym? Is visibility default? Is it stripped?” with:

nm -D --defined-only libfoo.so | grep add3
readelf -Ws libfoo.so | grep add3          # shows binding/visibility + which symtab
objdump -T libfoo.so | grep add3           # dynamic symbol table view

On macOS (Mach-O):

nm -gU libfoo.dylib | grep add3
otool -Iv libfoo.dylib | grep add3

On Windows (PE/COFF):

dumpbin /EXPORTS foo.dll
dumpbin /SYMBOLS foo.lib | findstr add3

The invariant you are checking is always the same: the compiler produced a definition, the linker kept it, and the final binary exposes it in the loader-visible export metadata for your platform.

Edmondo's Vault

Explorer