ASOF Join — Implementation Strategies

Four distinct strategies exist for implementing ASOF joins, each optimized for different data characteristics. All fuse the inequality condition with the top-1-per-group selection to avoid the intermediate blowup of the naive approach.

Prerequisites

  • ASOF Join — what an ASOF join is and why it can’t be expressed efficiently in standard SQL
  • 11 - Join algorithms — equi-join fundamentals (hash join, sort-merge join)

Sort-Merge ASOF Join

The canonical algorithm, used by DuckDB (v0.9.0), Polars, and pandas.

Mechanism

The algorithm has two phases: group + sort, then two-pointer merge.

Phase 1 — Group and sort. Group both tables by the equality key (e.g., ticker). Within each group, sort both sides by the timestamp column. After this step, you have pairs of sorted arrays — one pair per ticker.

Phase 2 — Two-pointer merge (per group). For each group, walk through the left (trades) and right (quotes) sorted arrays simultaneously using two pointers:

For each group (e.g., ticker = "AAPL"):
  Let L = trades for this ticker, sorted by timestamp
  Let R = quotes for this ticker, sorted by timestamp

  Initialize pointer j = 0 into R

  For each row L[i] (walking left to right through trades):
    Advance j forward while R[j].ts <= L[i].ts
    Now j points to the first quote AFTER L[i].ts
    So the match is R[j-1] — the last quote at or before L[i].ts
    Emit (L[i], R[j-1]), or (L[i], NULL) if j == 0

The critical property: j never moves backward. Each trade advances j forward to the right position, and the next trade starts from wherever j already is. This means each element in both L and R is visited at most once — the merge is a single linear pass.

The grouping step is a logical grouping (like SQL GROUP BY), not a hash join. No hash table is built. The entire algorithm is sort-based — hence the name "sort-merge."

Complexity

PhaseCost
Sorting
Merge
Total where
Space

When It Wins

Works well when both tables are large and similarly sized. The sort is the dominant cost, and the merge is a single linear pass. If either table is already sorted (e.g., by a clustered index or ingestion order), the sort phase is free for that side.

When It Loses

When the tables are highly skewed in size — e.g., 64 probe rows against 1 billion build rows. You must read, partition, and sort the entire right table even though most of it is irrelevant. See the Loop Join Optimization below for DuckDB’s solution.

Used by ClickHouse.

Mechanism

Both this algorithm and sort-merge use a hash table to group rows by the equality key (e.g., ticker). That grouping step is identical. The difference is what happens within each group and what each algorithm requires from the left (probe) side.

Build phase (right side only): Insert each right-side row into a hash table keyed by the equality column. Within each bucket, sort the rows by timestamp. After the build, each bucket contains a sorted array of (timestamp, payload) pairs from the right table.

Probe phase (left side, any order): For each left row — in whatever order it arrives — look up the equality key in the hash table (O(1)), then binary search within that bucket for the largest r.ts <= l.ts (O(log k), where k is the number of right-side rows in that bucket).

Each probe is independent — no state carries between probes. This is the key difference from sort-merge, where the two-pointer state (j) carries between rows and requires the left side to be sorted.

What’s different from sort-merge

Sort-mergeHash + binary search
Left side must be sorted?YesNo — any order works
Probe within a groupAdvance pointer forward (amortized O(1))Binary search (O(log k))
State between probesYes — pointer carries forwardNo — each probe is independent
Total merge cost

Complexity

PhaseCost
Build amortized hashing + sort per bucket
Probe — one binary search per left row
Total
Space for the hash table

When It Wins

When the left side arrives in unpredictable order (streaming probes from multiple sources, random-access lookups) and sorting it would be wasteful. Each probe is independent — highly parallelizable. Also good when the right-side buckets are small (log k is tiny).

When It Loses

If both sides are large and already sortable, sort-merge’s O(n + m) merge beats O(n · log k) binary searches. Also, binary search has worse cache locality than a linear scan on sorted data. If the right table doesn’t fit in memory, the hash table spills.

ClickHouse Constraints

  • Exactly one inequality condition allowed
  • Right table should be sorted by the inequality column for best performance
  • Supports INNER and LEFT OUTER variants only
SELECT t.*, q.price
FROM trades t
ASOF LEFT JOIN quotes q
ON t.symbol = q.symbol AND t.time >= q.time;

Streaming Merge

Used by QuestDB.

Mechanism

QuestDB is a purpose-built time-series database that guarantees data is ordered by its designated timestamp at ingestion time. This means:

  1. Both tables are already sorted — no sort phase needed
  2. The join is a pure two-pointer merge: with zero preprocessing
  3. Memory usage is in streaming mode — no need to materialize either table

This is the theoretically optimal implementation, but it requires the database to enforce timestamp ordering as an invariant.

Complexity

PhaseCost
SortFree — ingestion-time guarantee
Merge
Total
Space streaming

QuestDB Syntax

SELECT m.timestamp, m.symbol, m.price AS trade_price,
       p.bid_price, p.ask_price
FROM market_data m
ASOF JOIN core_price p ON m.symbol = p.symbol;

The timestamps come from each table’s designated timestamp column implicitly. QuestDB also supports a TOLERANCE clause to bound the lookback window.

When It Wins

Unbeatable for time-series workloads where data arrives in order. Zero overhead beyond reading the data.

When It Loses

Only works when both sides are pre-sorted by the join key. General-purpose databases can’t assume this.

Loop Join Optimization (DuckDB, Feb 2025)

Described in DuckDB’s “AsOf Plans” blog post. Addresses the skewed-size problem that defeats sort-merge.

The Problem with Sort-Merge for Skewed Sizes

When the left (probe) table has 64 rows and the right (build) table has 1 billion rows, sort-merge is wasteful:

  • You must read, partition, and sort all 1 billion right rows
  • This may require spilling to disk
  • 99.99% of the right table data is irrelevant to the 64 probe rows

Mechanism

The optimization swaps sides: treat the small left table as the “build” side.

  1. Assign unique row IDs to each left row (via row_number() window function)
  2. Stream the large right table through
  3. For each right row, check against all left rows using arg_max aggregation to track the best match per left row ID
  4. Group by left row ID to produce the final result

Complexity

PhaseCost
Build — the small table
Streaming scan — single pass, no sort
Total assuming
Space — proportional to the small table

Key Properties

  • Memory footprint proportional to the small table, not the large one
  • No disk spill for the common case (small probe against large historical data)
  • Highly parallelizable — the streaming scan distributes naturally
  • DuckDB makes this configurable: PRAGMA asof_loop_join_threshold = N

When It Wins

Dramatically faster for skewed sizes. DuckDB’s benchmarks showed the loop join plan avoiding disk spills entirely for 64 rows × 1 billion rows, where sort-merge ground to a halt.

When It Loses

When both tables are large and similarly sized — the loop join degrades because it compares each right row against all left rows (effectively ). The sort-merge approach is better for balanced sizes.

Summary: Choosing a Strategy

StrategyBest ForTimeSpaceRequires Pre-sort?
Sort-MergeBalanced table sizesNo
Hash + Binary SearchRight table fits in memoryNo
Streaming MergePre-sorted time-seriesYes
Loop JoinSmall probe, huge buildNo

The sort-merge algorithm is explored interactively in the ASOF join notebook, where you can step through the two-pointer merge on synthetic trades/quotes data and compare it against the naive inequality join.

See also