PostgreSQL uses heap tables to store rows when no clustering is applied. Rows are placed in pages without any inherent ordering, and the database relies on indexes to locate specific rows efficiently. Each row is identified by a Tuple Identifier (TID), consisting of the page number and the line pointer within the page.

A heap page consists of:

  • Page Header: Contains metadata like the page size, number of tuples, and pointers to free space.
  • Line Pointer Array: Each entry corresponds to a tuple, pointing to the data in the page body.
  • Tuple Data: Rows are stored here, potentially out of order to optimize space usage.

Inserts use the FSM to find a suitable page, while updates often create a new version of the row to maintain MVCC semantics. Deleting or updating rows does not immediately free space but instead marks tuples as dead.

Segments in PostgreSQL

PostgreSQL splits large tables into segments of 1GB. When a table’s size exceeds this threshold, the database creates additional files with .1, .2, etc. extensions. Each segment is managed independently, but the FSM tracks free space across all segments as a single logical structure.

I/O Characteristics and Page Writes

PostgreSQL uses fixed-size pages (default 8KB) for heap storage. A single page write is atomic, meaning that writing or updating a page involves a single disk I/O operation. This design allows efficient in-place updates of line pointers while avoiding complex pointer-chasing strategies like linked lists.

While the header line pointers must remain sorted to support binary searches, new pointers are inserted by shifting existing entries within the page. Since the entire page is written in a single operation, this overhead remains manageable.

The WAL ensures durability by recording changes before pages are written to disk, minimizing the risk of corruption during crashes.