ASOF Join — Implementation Strategies
Four distinct strategies exist for implementing ASOF joins, each optimized for different data characteristics. All fuse the inequality condition with the top-1-per-group selection to avoid the intermediate blowup of the naive approach.
Prerequisites
- ASOF Join — what an ASOF join is and why it can’t be expressed efficiently in standard SQL
- 11 - Join algorithms — equi-join fundamentals (hash join, sort-merge join)
Sort-Merge ASOF Join
The canonical algorithm, used by DuckDB (v0.9.0), Polars, and pandas.
Mechanism
The algorithm has two phases: group + sort, then two-pointer merge.
Phase 1 — Group and sort. Group both tables by the equality key (e.g., ticker). Within each group, sort both sides by the timestamp column. After this step, you have pairs of sorted arrays — one pair per ticker.
Phase 2 — Two-pointer merge (per group). For each group, walk through the left (trades) and right (quotes) sorted arrays simultaneously using two pointers:
For each group (e.g., ticker = "AAPL"):
Let L = trades for this ticker, sorted by timestamp
Let R = quotes for this ticker, sorted by timestamp
Initialize pointer j = 0 into R
For each row L[i] (walking left to right through trades):
Advance j forward while R[j].ts <= L[i].ts
Now j points to the first quote AFTER L[i].ts
So the match is R[j-1] — the last quote at or before L[i].ts
Emit (L[i], R[j-1]), or (L[i], NULL) if j == 0
The critical property: j never moves backward. Each trade advances j forward to the right position, and the next trade starts from wherever j already is. This means each element in both L and R is visited at most once — the merge is a single linear pass.
The grouping step is a logical grouping (like SQL
GROUP BY), not a hash join. No hash table is built. The entire algorithm is sort-based — hence the name "sort-merge."
Complexity
| Phase | Cost |
|---|---|
| Sorting | |
| Merge | |
| Total | where |
| Space |
When It Wins
Works well when both tables are large and similarly sized. The sort is the dominant cost, and the merge is a single linear pass. If either table is already sorted (e.g., by a clustered index or ingestion order), the sort phase is free for that side.
When It Loses
When the tables are highly skewed in size — e.g., 64 probe rows against 1 billion build rows. You must read, partition, and sort the entire right table even though most of it is irrelevant. See the Loop Join Optimization below for DuckDB’s solution.
Hash + Binary Search
Used by ClickHouse.
Mechanism
Both this algorithm and sort-merge use a hash table to group rows by the equality key (e.g., ticker). That grouping step is identical. The difference is what happens within each group and what each algorithm requires from the left (probe) side.
Build phase (right side only): Insert each right-side row into a hash table keyed by the equality column. Within each bucket, sort the rows by timestamp. After the build, each bucket contains a sorted array of (timestamp, payload) pairs from the right table.
Probe phase (left side, any order): For each left row — in whatever order it arrives — look up the equality key in the hash table (O(1)), then binary search within that bucket for the largest r.ts <= l.ts (O(log k), where k is the number of right-side rows in that bucket).
Each probe is independent — no state carries between probes. This is the key difference from sort-merge, where the two-pointer state (j) carries between rows and requires the left side to be sorted.
What’s different from sort-merge
| Sort-merge | Hash + binary search | |
|---|---|---|
| Left side must be sorted? | Yes | No — any order works |
| Probe within a group | Advance pointer forward (amortized O(1)) | Binary search (O(log k)) |
| State between probes | Yes — pointer carries forward | No — each probe is independent |
| Total merge cost |
Complexity
| Phase | Cost |
|---|---|
| Build | amortized hashing + sort per bucket |
| Probe | — one binary search per left row |
| Total | |
| Space | for the hash table |
When It Wins
When the left side arrives in unpredictable order (streaming probes from multiple sources, random-access lookups) and sorting it would be wasteful. Each probe is independent — highly parallelizable. Also good when the right-side buckets are small (log k is tiny).
When It Loses
If both sides are large and already sortable, sort-merge’s O(n + m) merge beats O(n · log k) binary searches. Also, binary search has worse cache locality than a linear scan on sorted data. If the right table doesn’t fit in memory, the hash table spills.
ClickHouse Constraints
- Exactly one inequality condition allowed
- Right table should be sorted by the inequality column for best performance
- Supports
INNERandLEFT OUTERvariants only
SELECT t.*, q.price
FROM trades t
ASOF LEFT JOIN quotes q
ON t.symbol = q.symbol AND t.time >= q.time;Streaming Merge
Used by QuestDB.
Mechanism
QuestDB is a purpose-built time-series database that guarantees data is ordered by its designated timestamp at ingestion time. This means:
- Both tables are already sorted — no sort phase needed
- The join is a pure two-pointer merge: with zero preprocessing
- Memory usage is in streaming mode — no need to materialize either table
This is the theoretically optimal implementation, but it requires the database to enforce timestamp ordering as an invariant.
Complexity
| Phase | Cost |
|---|---|
| Sort | Free — ingestion-time guarantee |
| Merge | |
| Total | |
| Space | streaming |
QuestDB Syntax
SELECT m.timestamp, m.symbol, m.price AS trade_price,
p.bid_price, p.ask_price
FROM market_data m
ASOF JOIN core_price p ON m.symbol = p.symbol;The timestamps come from each table’s designated timestamp column implicitly. QuestDB also supports a TOLERANCE clause to bound the lookback window.
When It Wins
Unbeatable for time-series workloads where data arrives in order. Zero overhead beyond reading the data.
When It Loses
Only works when both sides are pre-sorted by the join key. General-purpose databases can’t assume this.
Loop Join Optimization (DuckDB, Feb 2025)
Described in DuckDB’s “AsOf Plans” blog post. Addresses the skewed-size problem that defeats sort-merge.
The Problem with Sort-Merge for Skewed Sizes
When the left (probe) table has 64 rows and the right (build) table has 1 billion rows, sort-merge is wasteful:
- You must read, partition, and sort all 1 billion right rows
- This may require spilling to disk
- 99.99% of the right table data is irrelevant to the 64 probe rows
Mechanism
The optimization swaps sides: treat the small left table as the “build” side.
- Assign unique row IDs to each left row (via
row_number()window function) - Stream the large right table through
- For each right row, check against all left rows using
arg_maxaggregation to track the best match per left row ID - Group by left row ID to produce the final result
Complexity
| Phase | Cost |
|---|---|
| Build | — the small table |
| Streaming scan | — single pass, no sort |
| Total | assuming |
| Space | — proportional to the small table |
Key Properties
- Memory footprint proportional to the small table, not the large one
- No disk spill for the common case (small probe against large historical data)
- Highly parallelizable — the streaming scan distributes naturally
- DuckDB makes this configurable:
PRAGMA asof_loop_join_threshold = N
When It Wins
Dramatically faster for skewed sizes. DuckDB’s benchmarks showed the loop join plan avoiding disk spills entirely for 64 rows × 1 billion rows, where sort-merge ground to a halt.
When It Loses
When both tables are large and similarly sized — the loop join degrades because it compares each right row against all left rows (effectively ). The sort-merge approach is better for balanced sizes.
Summary: Choosing a Strategy
| Strategy | Best For | Time | Space | Requires Pre-sort? |
|---|---|---|---|---|
| Sort-Merge | Balanced table sizes | No | ||
| Hash + Binary Search | Right table fits in memory | No | ||
| Streaming Merge | Pre-sorted time-series | Yes | ||
| Loop Join | Small probe, huge build | No |
The sort-merge algorithm is explored interactively in the ASOF join notebook, where you can step through the two-pointer merge on synthetic trades/quotes data and compare it against the naive inequality join.
See also
- ASOF Join — concept overview, taxonomy, system support matrix
- IEJoin Algorithm — the general inequality join algorithm that ASOF specializes
- AsOfJoin in Spark — workarounds needed when native ASOF isn’t available
- 11 - Join algorithms — equi-join cost models for comparison