The Translation Lookaside Buffer (TLB) is a specialized memory cache in the CPU that stores recent address translations. It is a crucial part of the memory management unit (MMU) in modern processors and plays a vital role in speeding up virtual memory access. When a program accesses memory, the CPU needs to translate the virtual address (used by the program) into a physical address (used by the hardware). This translation involves consulting Page Tables maintained by the operating system, which can be a time-consuming process.

To minimize the latency caused by frequent address translations, the TLB acts as a fast, hardware-level cache for recent translations. Here’s how it works in practice:

  1. Address Lookup: When a virtual address is accessed, the CPU first checks the TLB to see if there’s a matching entry for this address.
  2. Hit or Miss: If the TLB contains the required translation (a “hit”), the CPU retrieves the physical address directly from the TLB, bypassing the page table lookup and saving time.
  3. Fallback on Miss: If there is no matching entry in the TLB (a “miss”), the CPU falls back to the Page Tables to get the translation, which is slower. The newly found address translation is then loaded into the TLB for future use.

By caching these mappings, the TLB reduces the need to access the page tables repeatedly for the same addresses, significantly enhancing memory access speed.

However, the TLB has limited space and can only store a fixed number of address mappings, leading to scenarios where older entries must be replaced to make room for new ones. This process is called TLB replacement.

Tagged TLB

In systems with multiple address spaces, such as in virtualized environments, tagged TLBs are sometimes used to assign identifiers to each address space, allowing the TLB to retain entries from multiple VMs without the need to clear it on each context switch. The TLB is thus essential for efficient memory management, providing faster memory access and reducing latency.

TLB Shootdown

A TLB shootdown is the mechanism used by the operating system to invalidate TLB entries across multiple CPU cores or threads. If a page mapping changes (for example, you munmap a region or change permissions with mprotect), other cores may have cached the old translation in their TLB. The OS must send an Inter-Processor Interrupt (IPI) to those cores, telling them to invalidate (or “shoot down”) the relevant TLB entries. This ensures that all processors see the same correct memory mapping.

Because shootdowns involve cross-core communication, they can be expensive in high-throughput systems (like databases doing frequent remappings), which is part of why some papers discourage heavy use of mmap for certain data-intensive workloads.