PostgreSQL uses different types of scans to retrieve data efficiently, choosing the optimal method based on table size, indexes, and query conditions. Understanding these scans is essential for performance tuning and query optimization.
Sequential Scan
A sequential scan reads the entire table from disk, page by page. PostgreSQL chooses this method when:
- There is no usable index.
- The query needs to retrieve a large portion of the table, making an index lookup inefficient.
- The cost of fetching individual rows via an index outweighs a full table scan.
A sequential scan is performed in blocks, taking advantage of PostgreSQL’s buffer management to optimize disk I/O. It is efficient when dealing with large datasets that must be fully processed.
Index Scan
An index scan allows PostgreSQL to retrieve specific rows efficiently by looking up values in an index. The process involves two steps:
- The index lookup retrieves the tuple ID (TID) of matching rows.
- The heap fetch retrieves the actual data from the table storage (heap).
While an index scan reduces the number of rows processed, it can be costly when many rows match because each lookup requires a separate heap fetch. This makes index scans inefficient for queries returning a high percentage of rows in large tables.
Bitmap Heap Scan
A bitmap heap scan improves performance when an index scan would be too expensive due to many matching rows. Instead of fetching each row separately, PostgreSQL:
- Performs a bitmap index scan, collecting all matching tuple IDs (TIDs).
- Stores these TIDs in a bitmap, marking relevant heap pages.
- Reads only the necessary pages in a batch operation, reducing random disk I/O.
Bitmaps also enable combining multiple index conditions using bitwise operations. If a query uses multiple indexed filters, PostgreSQL can create separate bitmaps and merge them with logical AND or OR operations before fetching the heap pages.
Example: Merging Bitmaps from Multiple Indexes
Consider a query with two indexed conditions:
SELECT * FROM flights WHERE status = 'Scheduled' AND departure > '2024-02-15';Step 1: Bitmap Index Scan for status = 'Scheduled'
The index scan finds matches on tuple IDs {101, 204, 307, 409, 512}. The corresponding bitmap:
0000100100010000 (marked by page offsets)
Step 2: Bitmap Index Scan for departure > '2024-02-15'
The index scan finds matches on tuple IDs {101, 206, 307, 512, 615}. The corresponding bitmap:
0000110000010001
Step 3: Bitwise AND Operation (for AND Condition)
PostgreSQL performs a bitwise AND to get pages that satisfy both conditions:
0000100000010000
This results in pages {101, 307, 512} as the final set of pages that will be fetched in a batch operation, improving efficiency.
Lossy vs Precise Bitmaps
PostgreSQL limits the memory used for bitmaps to prevent excessive memory consumption when scanning large tables. Instead of keeping a full bitmap in memory, it dynamically switches between precise (tuple-level) storage and lossy (page-level) storage:
- In precise mode, each bit corresponds to a specific tuple.
- In lossy mode, each bit represents an entire page, meaning PostgreSQL may fetch some extra tuples that will later be filtered out.
Lossy storage is necessary because maintaining a full tuple-level bitmap for large tables is infeasible—a 1 billion-row table would require 125MB of memory just for the bitmap. By grouping multiple tuples into a single bit, PostgreSQL significantly reduces memory overhead while still keeping scans efficient. If too many pages enter the bitmap, precision is reduced, potentially increasing heap fetches.
Index-Only Scan and the Visibility Map
An index-only scan retrieves all required data directly from the index without fetching rows from the heap. This is only possible if:
- The query selects columns that are all present in the index.
- The Visibility Map confirms that the corresponding heap pages contain only visible tuples.
-- Step 1: Drop the existing primary key constraint
ALTER TABLE my_table DROP CONSTRAINT my_table_pkey;
-- Step 2: Create a new unique index including an extra column
CREATE UNIQUE INDEX my_table_pkey_new ON my_table (id) INCLUDE (extra_column);
-- Step 3: Set the new index as the primary key
ALTER TABLE my_table ADD CONSTRAINT my_table_pkey PRIMARY KEY USING INDEX my_table_pkey_new;