Materialization
- Early (all data)
- Late(record ID only)
Algorithms
Important
The costs of different join algorithms is thought in terms of IO operations since they dominate.
Nested Loop
- Simple Nested Loop: Traverse all pages of the inner table for each tuple of the outer table
- Block Nested Loop: For each page in the outer table, fetch a page in the inner table and compare
- Index Nested Loop: Lookup by index in the inner table for each element of the outer table
If is the number of pages on the outerr table , is the number of tuples in the outer table, is the number of pages in the inner table with tuples,
- Nested loops require potentially scanning all pages for each of the tuples.
- Block nested loop: . This cost can further be decreased if we have more buffers in the buffer pool since we can hold pages in memory so we need since we don’t need to load the N pages M time, but we can reuse
- Index Nested Loop, assuming a fixed cost for the index lookup,
Sort Merge Join
Useful if one of the tables is already sorted (i.e. clustered index) or if the output need to be sorted anyways. The cost is the as defined in 10 - Sorting and Aggregation
Hash Join
Basic Hash Join: build outer relation hashtable, probe on loop on inner Grace Hash Join / Partitioned Hash Join: hash inner table into partitions to write them to disk
Comparison
This assumes
| Algorithm | I/O Cost | Example |
|---|---|---|
| Simple Nested Loop Join | 1.4 hours | |
| Block Nested Loop Join | 50 seconds | |
| Index Nested Loop Join | varies | |
| Sort Merge Join | 0.75 seconds | |
| Hash Join | 0.45 seconds |