External GPUs (eGPUs)

Why eGPUs Exist

Laptops and mini PCs are constrained by thermal envelopes and power budgets. A thin laptop might dissipate 45—65 W for its entire system, while a single desktop GPU like the NVIDIA RTX 4090 draws up to 450 W and requires a large heatsink with multiple fans. This means laptop GPUs are always cut-down versions of their desktop counterparts — fewer cores, lower clocks, less VRAM (Video Random Access Memory).

Users who need desktop-class GPU performance — for AI inference, gaming, video editing, or 3D rendering — but prefer the portability of a laptop face a fundamental trade-off. An eGPU (external Graphics Processing Unit) resolves this by moving the GPU outside the laptop into a separate enclosure connected by a high-speed cable.

What an eGPU Is

An eGPU enclosure is a chassis that provides:

  • A PCIe (Peripheral Component Interconnect Express) slot to seat a standard desktop graphics card.
  • A PSU (Power Supply Unit), typically 500—750 W, to power the GPU independently of the laptop’s battery.
  • A connection interface (Thunderbolt, USB4, or OCuLink) that carries PCIe data between the laptop and the GPU.

The laptop’s CPU sends PCIe commands over the external cable. The GPU executes those commands — shader programs, tensor operations, video decode — using its own VRAM and compute cores, then sends results back over the same cable. This is the same protocol used when a GPU sits in a motherboard’s PCIe slot; the only difference is the cable in the middle, which introduces bandwidth limits and latency.

Connection Interfaces

The interface between the laptop and the eGPU enclosure is the single biggest factor determining performance. All eGPU performance loss relative to a natively installed GPU traces back to this link.

Thunderbolt 3 and Thunderbolt 4

Thunderbolt (developed by Intel and Apple) version 3 and 4 both provide 40 Gbps (gigabits per second) of total bandwidth over a USB-C physical connector. However, the full 40 Gbps is not available to the GPU. Thunderbolt is a tunneling protocol that multiplexes several sub-protocols over a single link:

  • DisplayPort video (for driving the laptop’s own screen or daisy-chained monitors)
  • USB 3.x data
  • PCIe data (what the GPU needs)

After accounting for protocol overhead, DisplayPort tunneling, and USB traffic, roughly 22 Gbps remains for PCIe data. This is approximately equivalent to a PCIe 3.0 x4 link.

How the tunneling works: The Thunderbolt controller on the laptop side encapsulates PCIe TLPs (Transaction Layer Packets — the fundamental units of communication in PCIe, carrying read/write requests and completions) into Thunderbolt transport frames. On the eGPU side, a second Thunderbolt controller decapsulates them and presents a standard PCIe root port to the GPU. This protocol translation adds latency (typically 2—5 microseconds per hop) on top of the bandwidth constraint.

Typical performance penalty: 15—25% compared to the same GPU installed natively in a PCIe x16 slot.

USB4

USB4 also provides 40 Gbps over USB-C and is based on the Thunderbolt 3 protocol (Intel contributed the specification to the USB-IF — USB Implementers Forum). However, USB4 is a specification with optional features. Tunneling PCIe for eGPU use is not mandatory — a USB4 port may or may not support it depending on the host controller and firmware. Always check the laptop’s specifications before buying an eGPU for USB4 use.

When PCIe tunneling is supported, USB4 performs comparably to Thunderbolt 3/4: roughly 22 Gbps usable for GPU data, with a 15—25% performance penalty.

OCuLink takes a fundamentally different approach. Instead of tunneling PCIe inside another protocol, it carries raw PCIe signals directly over a cable, using the SFF-8612 connector (a specialized connector defined by the SFF — Small Form Factor — committee).

OCuLink 1.0 provides PCIe 4.0 x4, which is 4 lanes at 16 GT/s (gigatransfers per second) each, for a total of 64 GT/s or approximately 63 Gbps of usable bandwidth after 128b/130b encoding. Because there is no protocol translation — no Thunderbolt controller encapsulating and decapsulating packets — latency is minimal and nearly the full bandwidth is available to the GPU.

Typical performance penalty: approximately 5% compared to native PCIe x16 (the loss comes from using x4 lanes instead of x16, not from protocol overhead).

OCuLink ports are found on some mini PCs from manufacturers like Minisforum and AOOSTAR, but are rare on mainstream laptops. The SFF-8612 connector is also less robust than USB-C — it was designed for internal server connections, not daily plugging and unplugging.

Thunderbolt 5

Released in 2024—2025, Thunderbolt 5 increases baseline bandwidth to 80 Gbps, with a “Bandwidth Boost” mode that reaches 120 Gbps by using asymmetric signaling (three lanes transmitting, one receiving). After protocol overhead, approximately 64 Gbps is available for PCIe data — roughly matching OCuLink and equivalent to a PCIe 4.0 x4 link.

Thunderbolt 5 significantly narrows the gap with OCuLink while retaining the convenience of USB-C connectors, hot-plug support, and display tunneling.

Bandwidth Comparison

InterfaceRaw BandwidthUsable for GPUPCIe EquivalentTypical Performance Loss
Thunderbolt 3/440 Gbps~22 Gbps~PCIe 3.0 x415—25%
USB440 Gbpsvaries~PCIe 3.0 x415—25%
OCuLink 1.064 Gbps~63 GbpsPCIe 4.0 x4~5%
Thunderbolt 580—120 Gbps~64 Gbps~PCIe 4.0 x4~10%
Native PCIe 4.0 x16256 Gbps~252 GbpsPCIe 4.0 x160% (baseline)

Worked Example: Bandwidth Calculation for Thunderbolt 3

Thunderbolt 3 provides 40 Gbps total. Assuming a 1440p 165 Hz DisplayPort stream is tunneled (approximately 14 Gbps after DSC — Display Stream Compression) and USB 3.x overhead consumes roughly 4 Gbps, the remainder is:

40 - 14 - 4 = 22 Gbps for PCIe

Converting to bytes: 22 Gbps / 8 = 2.75 GB/s. A native PCIe 4.0 x16 slot provides approximately 31.5 GB/s, so the Thunderbolt 3 link offers about 8.7% of native bandwidth. In practice, GPU workloads do not saturate the PCIe bus continuously, which is why the real-world penalty is 15—25% rather than 90%.

Display Output Path

When a GPU renders a frame, that frame must reach a display. With an eGPU, there are two paths:

  1. External monitor connected to the eGPU: The GPU outputs frames directly through its own HDMI/DisplayPort connectors. No frame data travels back over the Thunderbolt/OCuLink cable. This is the faster path.

  2. Laptop’s built-in screen: The GPU must send the rendered frame back over the cable to the laptop’s display controller. This consumes bandwidth (a 4K 60 Hz uncompressed frame is roughly 12 Gbps) and adds one extra traversal of the cable, increasing latency by one to two frames. Gaming on the laptop screen with a Thunderbolt eGPU can feel noticeably less responsive than using an external monitor.

Use Cases

AI and ML Inference

This is where eGPUs shine relative to their bandwidth limitations. When running inference with tools like OLLAMA, the workflow is:

  1. Model loading: The model weights are transferred from system RAM (or disk) to GPU VRAM over the cable. For a 7B-parameter model in 4-bit quantization (~4 GB), this takes roughly 1.5 seconds over Thunderbolt 3 (at 2.75 GB/s) — a one-time cost.

  2. Inference: Once the model is in VRAM, inference is compute-bound, not bandwidth-bound. The GPU performs matrix multiplications entirely within its own VRAM and compute cores. Only small prompt tokens flow in and generated tokens flow out — kilobytes, not gigabytes. The cable bandwidth is largely irrelevant during this phase.

Result: A Thunderbolt 3 eGPU running an LLM (Large Language Model) through Ollama will produce nearly the same tokens per second as the same GPU installed natively. The 15—25% eGPU penalty that affects gaming barely registers for inference workloads.

Gaming

Gaming is more bandwidth-sensitive than inference. The CPU continuously streams draw calls, texture data, and geometry to the GPU, and the GPU sends rendered frames back (if using the laptop screen). The PCIe bus is under sustained load, making the 15—25% Thunderbolt penalty clearly visible in frame rates and frame times. OCuLink’s ~5% penalty makes it a much better fit for gaming.

Video Editing and 3D Rendering

These workloads fall between inference and gaming. Timeline scrubbing and real-time preview in video editors stress the bus, but final renders are GPU-compute-bound (similar to inference). Most users report acceptable performance over Thunderbolt 3/4 for editing, with OCuLink or Thunderbolt 5 providing a smoother experience for scrubbing 4K+ timelines.

Limitations

  • Hot-plug support: macOS has the best eGPU hot-plug support (Apple promoted Thunderbolt eGPUs from 2018 onward). Linux supports hot-plug through the PCIe hotplug subsystem but driver reloading can be unreliable depending on the GPU vendor. Windows historically has the worst hot-plug support — disconnecting an eGPU without safely ejecting can cause a BSOD (Blue Screen of Death).

  • Driver compatibility: Not all GPUs work in all enclosures. Some eGPU enclosures use a PCIe bridge chip that can cause compatibility issues. NVIDIA drivers on macOS are unsupported entirely (Apple dropped NVIDIA support after macOS Mojave). On Linux, the nvidia proprietary driver and nouveau open-source driver both support eGPUs, but power management (runtime suspend/resume) remains fragile.

  • Power delivery: Many Thunderbolt eGPU enclosures also deliver power to the laptop over the same cable (up to 100 W with USB-PD — USB Power Delivery, or 240 W with USB-PD EPR — Extended Power Range). This is convenient but means the enclosure’s PSU must power both the GPU and charge the laptop. Budget enclosures may not deliver enough wattage for both simultaneously.

  • Cable length: Thunderbolt 3/4 passive cables are limited to 0.7 m for 40 Gbps; active cables reach 2 m. OCuLink cables are typically 0.5—1 m. Longer cables degrade signal integrity and reduce usable bandwidth.

  • Cost: An eGPU enclosure alone costs 400 (USD), on top of the GPU itself. The total cost often exceeds that of a comparable desktop system, making eGPUs a convenience/portability premium rather than a value proposition.

How to check what your laptop supports

Before buying an eGPU, verify which interfaces your laptop actually has.

Identify your ports

  • Look for the Thunderbolt logo — a small lightning bolt icon next to or on the USB-C port. A plain USB-C port without this logo may be USB 3.x only (5-10 Gbps, far too slow for eGPU).
  • On Linux: boltctl list shows Thunderbolt devices. If the command is not found, your system may not have Thunderbolt. Also check: cat /sys/bus/thunderbolt/devices/*/device_name.
  • On macOS: Apple menu → About This Mac → System Report → Thunderbolt.
  • On Windows: Device Manager → System devices → look for “Thunderbolt Controller.”
  • OCuLink: check the manufacturer’s spec sheet. OCuLink ports are not standardised on consumer laptops — they appear primarily on mini PCs from Minisforum, AOOSTAR, and similar brands.

Framework laptops specifically

  • Framework 13 (11th/12th/13th gen Intel): has Thunderbolt 4 on 2 of 4 expansion card bays. These support eGPU enclosures. Check which expansion cards are installed — only the USB-C expansion cards on Thunderbolt-capable bays carry Thunderbolt.
  • Framework 13 (AMD Ryzen 7040): has USB4, not Thunderbolt. AMD’s USB4 implementation on this platform does support PCIe tunneling, so eGPUs generally work, but compatibility is less guaranteed than with Intel Thunderbolt. Check the Framework community forums for confirmed working enclosures.
  • Framework 16 (AMD Ryzen 7040/7045): has USB4 on some ports. Additionally, the Framework 16 has an expansion bay that accepts a discrete GPU module (an AMD Radeon RX 7700S) — this is a native PCIe connection, not an eGPU, and avoids Thunderbolt bandwidth penalties entirely.
  • Neither Framework model has OCuLink or Thunderbolt 5 as of 2025.

To verify: visit the Framework product spec page for your specific model and check the “Connectivity” or “Expansion Cards” section.

Mac Mini with unified memory vs Mini ITX with eGPU

For LLM inference specifically, a Mac Mini with Apple Silicon is a strong alternative to a Mini ITX build or laptop + eGPU setup, primarily because of the unified memory architecture.

The unified memory advantage

See Model Formats for detailed coverage of MLX and Apple Silicon’s unified memory.

On a traditional PC, the GPU has its own dedicated VRAM (e.g., 12 GB on an RTX 3060). LLM models that exceed VRAM must either be split across GPU + CPU (slow, because data shuttles over the PCIe bus) or run entirely on CPU (very slow). This makes VRAM the hard ceiling for model size.

On Apple Silicon, the CPU and GPU share a single memory pool. A Mac Mini M4 with 32 GB unified memory can load a 32 GB model and run it on the GPU — there is no separate VRAM pool to overflow. A Mac Mini M4 Pro with 48 GB can run models that would require an RTX 4090 (24 GB VRAM) or multi-GPU setups on a PC.

Cost comparison (approximate, 2025 USD)

SetupCostMax model size at full GPU speedTokens/sec (7B Q4)Notes
Mac Mini M4 (16 GB)~$600~14 GB (after OS)~30-40Compact, silent, low power (~25W)
Mac Mini M4 Pro (48 GB)~$1,600~44 GB~45-55Runs 30B+ models entirely on GPU
Framework 13 + TB4 eGPU + RTX 3060 12GB~300 encl + 2,00012 GB (VRAM limit)~40-50Portable laptop + desk setup
Mini ITX build (RTX 3060 12GB)~$900-1,10012 GB (VRAM limit)~45-55Best raw GPU perf per dollar
Mini ITX build (RTX 4090 24GB)~$2,000-2,50024 GB (VRAM limit)~90-120Fastest inference, loud, 450W GPU

Decision framework

  • If your priority is running large models (13B+, 30B+): Mac Mini M4 Pro with 48 GB is the most cost-effective path. 48 GB unified memory exceeds any single consumer GPU’s VRAM, and MLX inference is well-optimised.
  • If your priority is fastest possible tokens/sec on 7B models: Mini ITX with an NVIDIA GPU wins. CUDA + Tensor Cores outperform Apple’s GPU cores at raw matrix throughput.
  • If you already own a Framework laptop and want to add GPU acceleration: a Thunderbolt 4 eGPU enclosure with an RTX 3060 works well for inference (the bandwidth penalty is negligible for LLM inference). Check your Framework model supports Thunderbolt/USB4 PCIe tunneling first.
  • If portability matters: the laptop + eGPU combo lets you undock and work mobile. A Mac Mini or Mini ITX build stays on the desk.

Power consumption

SetupIdle / Inference power draw
Mac Mini M4~7W / ~25W
Mac Mini M4 Pro~10W / ~40W
Mini ITX (RTX 3060)~60W / ~250W
Mini ITX (RTX 4090)~80W / ~500W

For an always-on homelab server, the Mac Mini’s power efficiency is a significant ongoing cost advantage.

See also

  • Thunderbolt — the protocol used by most eGPU enclosures
  • IOMMU — required for passing an eGPU through to a VM
  • NVME — uses the same PCIe bus that eGPUs rely on
  • OLLAMA — local LLM inference tool that benefits from eGPU acceleration
  • Model Formats — covers MLX and Apple Silicon unified memory for LLM inference
  • Mini ITX Builds — alternative to eGPU when you need a native PCIe slot
  • Direct Memory Access — the mechanism GPUs use to transfer data to and from system memory