MySQL’s pluggable storage engine architecture separates the SQL layer from how data is physically stored. The engine you choose determines which index types, locking behaviors, and durability guarantees are available.
Storage Engines
InnoDB (default since MySQL 5.5)
The only engine that matters for production OLTP (Online Transaction Processing — high-concurrency workloads of short reads and writes). Provides ACID transactions (Atomicity, Consistency, Isolation, Durability), row-level locking, MVCC (Multi-Version Concurrency Control — readers see a consistent snapshot without blocking writers), crash recovery via redo log, and foreign keys. Every table is stored as a clustered B+tree keyed by the PRIMARY KEY — leaf nodes contain the actual row data.
See InnoDB Architecture for buffer pool, undo log, doublewrite buffer, and change buffer internals.
MEMORY (HEAP)
All data lives in RAM. No disk persistence — the table is empty after a restart. Supports both B-tree and hash indexes (the only engine that does).
CREATE TABLE session_cache ( token VARCHAR(100) NOT NULL, user_id INT NOT NULL, KEY idx_token (token) USING HASH) ENGINE=MEMORY;
Use case: ephemeral lookup tables, temporary caches, materialized intermediate results. In practice, most teams use Redis (an in-memory data store with persistence options) or memcached (a pure in-memory cache with no persistence) instead.
Warning
MEMORY tables use table-level locking, not row-level. Under concurrent writes they serialize. Also, VARCHAR columns are stored at their maximum declared length (no variable-length optimization), so memory usage can balloon.
MyISAM (legacy)
Table-level locking, no transactions, no crash recovery. Was the default before MySQL 5.5. The only reason you’d encounter it today is legacy tables that haven’t been migrated.
One remaining niche: MyISAM FULLTEXT indexes supported features (like boolean proximity search) that InnoDB FULLTEXT didn’t match until MySQL 8.0. That gap is now closed.
Other engines
Engine
Purpose
ARCHIVE
Compressed, insert-only. No indexes except auto-increment PK. Good for audit logs you’ll never query by anything other than range.
CSV
Stores data as literal CSV files. No indexes. Useful for exporting/importing.
BLACKHOLE
Accepts writes but discards data. Used for replication relay topologies.
NDB Cluster
Distributed, in-memory, shared-nothing. MySQL Cluster product.
FEDERATED
Queries a remote MySQL table. Essentially a foreign data wrapper.
Checking engine per table
SELECT TABLE_NAME, ENGINE FROM information_schema.TABLESWHERE TABLE_SCHEMA = 'mydb';
InnoDB B+Tree Indexes
The PRIMARY KEY defines the clustered index: the physical storage order of rows. This single design choice shapes every other index behavior. See BTrees for the underlying data structure.
Clustered index vs heap (Postgres comparison)
In Postgres, rows live in a heap (unordered), and every index (including the primary key index) stores a tuple identifier (TID) pointing into the heap. In InnoDB, the primary key B+tree is the data. There is no separate heap.
This means choosing a bad PK causes real damage. A random UUID as PK leads to page splits and fragmentation — every insert lands at a random position in the B+tree. Auto-increment integers insert sequentially at the end, keeping pages full and ordered.
Warning
Random UUIDs as PRIMARY KEY in InnoDB cause write amplification through page splits. If you need UUIDs, consider uuid_to_bin(uuid, 1) (swap time-high bits to front) or use UUIDv7 which is time-ordered.
Secondary indexes and the double lookup
Secondary indexes store the primary key value (not a row pointer). A lookup through a secondary index requires two B+tree traversals:
Walk the secondary index B+tree to find the PK value
Walk the clustered index B+tree to find the actual row
This “double lookup” is unique to clustered-index engines. It also means a wider PK (e.g., VARCHAR(255)) makes every secondary index larger because the PK is duplicated in every secondary index leaf.
Covering indexes
If all columns a query needs are in the secondary index, InnoDB answers from that index alone — no clustered index lookup. EXPLAIN shows “Using index” in the Extra column.
-- Composite index on (customer_id, status)-- This query is "covered":SELECT customer_id, status FROM orders WHERE customer_id = 42;
No INCLUDE clause like Postgres — in InnoDB, the PK columns are always implicitly appended to secondary indexes, so they’re available for free.
Prefix indexes
Index only the first N characters of a column:
KEY idx_name (column(20))
Useful for long varchar or text columns. Tradeoff: prefix indexes cannot be used for ORDER BY or as covering indexes — only for equality and range filtering on the prefix.
Multi-column indexes and the leftmost prefix rule
A composite index (a, b, c) satisfies queries on (a), (a, b), or (a, b, c) — but not(b) or (b, c) alone. The tree is sorted by a first, then b within each a value. Same behavior as Postgres B-tree.
FULLTEXT Indexes
MySQL’s built-in full-text search uses an inverted index — a data structure that maps each word to the list of rows containing it, the same structure used by search engines — stored in auxiliary InnoDB tables (since MySQL 5.6).
ALTER TABLE articles ADD FULLTEXT INDEX ft_body (body);-- Natural language mode (default): ranks by TF-IDF variantSELECT * FROM articlesWHERE MATCH(body) AGAINST('distributed systems');-- Boolean mode: explicit operatorsSELECT * FROM articlesWHERE MATCH(body) AGAINST('+required -excluded "exact phrase" term*' IN BOOLEAN MODE);
Boolean operators:+term (must be present), -term (must not be present), "exact phrase", term* (prefix wildcard).
Warning
Default minimum word length is 3 characters (innodb_ft_min_token_size). Shorter words are silently ignored. There is also a built-in stopword list. No BM25 scoring — MySQL uses a TF-IDF (Term Frequency–Inverse Document Frequency) variant without document length normalization. TF-IDF scores a term by how often it appears in this document (TF) weighted by how rare it is across all documents (IDF).
Compared to Postgres: no GIN flexibility, no custom parsers, no tsvector type for pre-computed search vectors. But simpler to set up for basic search.
Spatial Indexes
R-tree based indexes for geometry types. An R-tree (Rectangle tree) is a spatial index that groups nearby geometries into nested bounding rectangles, enabling fast spatial lookups. MySQL’s implementation is significantly less mature than PostGIS (the Postgres spatial extension — the industry standard for geospatial SQL, offering hundreds of spatial functions, multiple coordinate reference systems, and geography-vs-geometry distinction) — no equivalent to PostGIS’s rich function library, geography type, or spatial reference system support.
CREATE TABLE places ( id INT NOT NULL AUTO_INCREMENT, location GEOMETRY NOT NULL SRID 4326, PRIMARY KEY (id), SPATIAL KEY idx_location (location));
Hash Indexes
Only available in the MEMORY engine. USING HASH on an InnoDB table is silently ignored.
InnoDB does build hash indexes automatically via the adaptive hash index — it monitors B+tree access patterns and builds in-memory hash tables for frequently accessed pages, turning O(log n) lookups into O(1). This is not configurable per table; it’s a global optimization (innodb_adaptive_hash_index, on by default). Under high concurrency it can become a contention point and is sometimes disabled.
JSON Columns and Generated Columns
Generated columns
A generated column is a column whose value is computed from an expression over other columns. MySQL supports two flavors:
ALTER TABLE orders ADD COLUMN year_created INT GENERATED ALWAYS AS (YEAR(created_at)) STORED, ADD COLUMN full_name VARCHAR(200) GENERATED ALWAYS AS ( CONCAT(first_name, ' ', last_name) ) VIRTUAL;
Virtual
Stored
Computed
On every read (not materialized)
On write (materialized to disk)
Storage
None
Same as a regular column
Indexable
Yes (InnoDB only)
Yes (any engine)
Use case
Creating indexes on expressions without disk cost
Frequently read computed values you don’t want to recompute
Virtual generated columns are how MySQL achieves expression indexes — Postgres has native CREATE INDEX ON t ((expression)), but MySQL requires the intermediate generated column.
Indexing JSON data
MySQL cannot directly index inside a JSON column. Two approaches:
1. Functional index (MySQL 8.0.13+):
ALTER TABLE t ADD INDEX idx_key ((CAST(data->>'$.key' AS UNSIGNED)));
Under the hood, this creates a hidden virtual generated column. The functional index syntax is sugar for the explicit generated column approach below.
2. Generated column + index:
ALTER TABLE t ADD COLUMN key_val VARCHAR(100) GENERATED ALWAYS AS (data->>'$.key') VIRTUAL, ADD INDEX idx_key_val (key_val);
This is equivalent but makes the generated column visible and queryable directly.
json type internals
MySQL’s json type stores validated, binary-encoded JSON (not plain text like Postgres’s json type). Equivalent to Postgres’s jsonb. The -> operator returns JSON, ->> returns text.
Partitioning
Each partition is a separate InnoDB tablespace with its own B+tree — logically one table, physically multiple independent storage units.
Partition types
-- Range: common for time-seriesPARTITION BY RANGE (YEAR(created_at)) ( PARTITION p2023 VALUES LESS THAN (2024), PARTITION p2024 VALUES LESS THAN (2025), PARTITION p2025 VALUES LESS THAN (2026), PARTITION pmax VALUES LESS THAN MAXVALUE);-- List: discrete categoriesPARTITION BY LIST (region_id) ( PARTITION p_us VALUES IN (1, 2, 3), PARTITION p_eu VALUES IN (4, 5, 6));-- Hash: even distribution across N partitionsPARTITION BY HASH(id) PARTITIONS 8;-- Key: like Hash but uses MySQL's internal hashingPARTITION BY KEY(id) PARTITIONS 8;
Partition pruning
Queries with a WHERE clause on the partition key only scan relevant partitions. Without the partition key in the query, MySQL scans all partitions:
-- Pruned: only scans p2025SELECT * FROM events WHERE created_at >= '2025-01-01' AND created_at < '2026-01-01';-- Not pruned: scans every partitionSELECT * FROM events WHERE user_id = 42;
EXPLAIN PARTITIONS (or EXPLAIN in MySQL 8.0+) shows which partitions are accessed.
Drop partitions instead of DELETE
ALTER TABLE t DROP PARTITION p2023 is nearly instant — it removes the tablespace file. A DELETE WHERE created_at < '2024-01-01' does slow row-by-row deletion with undo log overhead.
Cross-partition queries
A query without the partition key in the WHERE clause scans all partitions. MySQL opens each partition’s B+tree, executes the query against each one, and merges the results. This is transparent to the application — the SQL is identical to a non-partitioned table. The cost is proportional to the number of partitions scanned.
ORDER BY and LIMIT work across partitions: MySQL sorts the merged result set. Aggregate functions (COUNT, SUM, MAX) also merge across partitions.
Cross-partition transactions
InnoDB transactions work across partitions within the same table and across different partitioned tables. A single BEGIN ... COMMIT block can insert into partition p2024, update a row in partition p2025, and delete from a different table’s partition — all atomically. Partitions are a storage optimization; they do not break transactional semantics.
BEGIN;INSERT INTO events (created_at, ...) VALUES ('2024-06-15', ...); -- hits p2024INSERT INTO events (created_at, ...) VALUES ('2025-11-20', ...); -- hits p2025UPDATE users SET last_active = NOW() WHERE id = 42; -- different tableCOMMIT; -- all-or-nothing across partitions and tables
Partition key must be part of unique indexes
Every UNIQUE index (including the PRIMARY KEY) must include the partition key. This is a MySQL requirement — without it, MySQL cannot guarantee uniqueness across partitions without a global lock. This often forces composite primary keys on partitioned tables.
InnoDB Internals That Affect Indexing
Change buffer
When a secondary index page is not in the buffer pool, InnoDB doesn’t fetch it from disk immediately. For non-unique secondary indexes, changes are recorded in the change buffer and merged when the page is eventually read.
Does this make reads inconsistent?
No. When a query needs to read a page that has buffered changes, InnoDB merges the buffered changes into the page before returning results. Reads are always consistent. The change buffer only defers the physical I/O of fetching the page at write time — it does not defer visibility. Unique indexes cannot use the change buffer because they must read the page immediately to check for duplicates.
Page size
InnoDB default page size is 16KB (vs Postgres 8KB). Larger pages mean fewer B+tree levels (higher fan-out), fewer I/O operations for lookups, but more wasted space for small rows. Configurable at instance level: innodb_page_size (4K, 8K, 16K, 32K, 64K).