ASOF Join — Implementation Strategies

Four distinct strategies exist for implementing ASOF joins, each optimized for different data characteristics. All fuse the inequality condition with the top-1-per-group selection to avoid the intermediate blowup of the naive approach.

Prerequisites

ASOF Join — what an ASOF join is and why it can’t be expressed efficiently in standard SQL

11 - Join algorithms — equi-join fundamentals (hash join, sort-merge join)

Sort-Merge ASOF Join

The canonical algorithm, used by DuckDB (v0.9.0), Polars, and pandas.

Mechanism

The algorithm has two phases: group + sort, then two-pointer merge.

Phase 1 — Group and sort. Group both tables by the equality key (e.g., ticker). Within each group, sort both sides by the timestamp column. After this step, you have pairs of sorted arrays — one pair per ticker.

Phase 2 — Two-pointer merge (per group). For each group, walk through the left (trades) and right (quotes) sorted arrays simultaneously using two pointers:

For each group (e.g., ticker = "AAPL"):
  Let L = trades for this ticker, sorted by timestamp
  Let R = quotes for this ticker, sorted by timestamp

  Initialize pointer j = 0 into R

  For each row L[i] (walking left to right through trades):
    Advance j forward while R[j].ts <= L[i].ts
    Now j points to the first quote AFTER L[i].ts
    So the match is R[j-1] — the last quote at or before L[i].ts
    Emit (L[i], R[j-1]), or (L[i], NULL) if j == 0

The critical property: j never moves backward. Each trade advances j forward to the right position, and the next trade starts from wherever j already is. This means each element in both L and R is visited at most once — the merge is a single linear pass.

The grouping step is a logical grouping (like SQL GROUP BY), not a hash join. No hash table is built. The entire algorithm is sort-based — hence the name "sort-merge."

Complexity

Phase	Cost
Sorting	$O (∣ L ∣ lo g ∣ L ∣ + ∣ R ∣ lo g ∣ R ∣)$
Merge	$O (∣ L ∣ + ∣ R ∣)$
Total	$O (n lo g n)$ where $n = max (∣ L ∣, ∣ R ∣)$
Space	$O (∣ L ∣ + ∣ R ∣)$

When It Wins

Works well when both tables are large and similarly sized. The sort is the dominant cost, and the merge is a single linear pass. If either table is already sorted (e.g., by a clustered index or ingestion order), the sort phase is free for that side.

When It Loses

When the tables are highly skewed in size — e.g., 64 probe rows against 1 billion build rows. You must read, partition, and sort the entire right table even though most of it is irrelevant. See the Loop Join Optimization below for DuckDB’s solution.

Hash + Binary Search

Used by ClickHouse.

Mechanism

Both this algorithm and sort-merge use a hash table to group rows by the equality key (e.g., ticker). That grouping step is identical. The difference is what happens within each group and what each algorithm requires from the left (probe) side.

Build phase (right side only): Insert each right-side row into a hash table keyed by the equality column. Within each bucket, sort the rows by timestamp. After the build, each bucket contains a sorted array of (timestamp, payload) pairs from the right table.

Probe phase (left side, any order): For each left row — in whatever order it arrives — look up the equality key in the hash table (O(1)), then binary search within that bucket for the largest r.ts <= l.ts (O(log k), where k is the number of right-side rows in that bucket).

Each probe is independent — no state carries between probes. This is the key difference from sort-merge, where the two-pointer state (j) carries between rows and requires the left side to be sorted.

What’s different from sort-merge

	Sort-merge	Hash + binary search
Left side must be sorted?	Yes	No — any order works
Probe within a group	Advance pointer forward (amortized O(1))	Binary search (O(log k))
State between probes	Yes — pointer carries forward	No — each probe is independent
Total merge cost	$O (n + m)$	$O (n \cdot lo g k)$

Complexity

Phase	Cost
Build	$O (∣ R ∣)$ amortized hashing + $O (k lo g k)$ sort per bucket
Probe	$O (∣ L ∣ \cdot lo g k)$ — one binary search per left row
Total	$O (∣ R ∣ lo g k + ∣ L ∣ lo g k)$
Space	$O (∣ R ∣)$ for the hash table

When It Wins

When the left side arrives in unpredictable order (streaming probes from multiple sources, random-access lookups) and sorting it would be wasteful. Each probe is independent — highly parallelizable. Also good when the right-side buckets are small (log k is tiny).

When It Loses

If both sides are large and already sortable, sort-merge’s O(n + m) merge beats O(n · log k) binary searches. Also, binary search has worse cache locality than a linear scan on sorted data. If the right table doesn’t fit in memory, the hash table spills.

ClickHouse Constraints

Exactly one inequality condition allowed
Right table should be sorted by the inequality column for best performance
Supports INNER and LEFT OUTER variants only

SELECT t.*, q.price
FROM trades t
ASOF LEFT JOIN quotes q
ON t.symbol = q.symbol AND t.time >= q.time;

Streaming Merge

Used by QuestDB.

Mechanism

QuestDB is a purpose-built time-series database that guarantees data is ordered by its designated timestamp at ingestion time. This means:

Both tables are already sorted — no sort phase needed
The join is a pure two-pointer merge: $O (∣ L ∣ + ∣ R ∣)$ with zero preprocessing
Memory usage is $O (1)$ in streaming mode — no need to materialize either table

This is the theoretically optimal implementation, but it requires the database to enforce timestamp ordering as an invariant.

Complexity

Phase	Cost
Sort	Free — ingestion-time guarantee
Merge	$O (∣ L ∣ + ∣ R ∣)$
Total	$O (∣ L ∣ + ∣ R ∣)$
Space	$O (1)$ streaming

QuestDB Syntax

SELECT m.timestamp, m.symbol, m.price AS trade_price,
       p.bid_price, p.ask_price
FROM market_data m
ASOF JOIN core_price p ON m.symbol = p.symbol;

The timestamps come from each table’s designated timestamp column implicitly. QuestDB also supports a TOLERANCE clause to bound the lookback window.

When It Wins

Unbeatable for time-series workloads where data arrives in order. Zero overhead beyond reading the data.

When It Loses

Only works when both sides are pre-sorted by the join key. General-purpose databases can’t assume this.

Loop Join Optimization (DuckDB, Feb 2025)

Described in DuckDB’s “AsOf Plans” blog post. Addresses the skewed-size problem that defeats sort-merge.

The Problem with Sort-Merge for Skewed Sizes

When the left (probe) table has 64 rows and the right (build) table has 1 billion rows, sort-merge is wasteful:

You must read, partition, and sort all 1 billion right rows
This may require spilling to disk
99.99% of the right table data is irrelevant to the 64 probe rows

Mechanism

The optimization swaps sides: treat the small left table as the “build” side.

Assign unique row IDs to each left row (via row_number() window function)
Stream the large right table through
For each right row, check against all left rows using arg_max aggregation to track the best match per left row ID
Group by left row ID to produce the final result

Complexity

Phase	Cost
Build	$O (∣ L ∣)$ — the small table
Streaming scan	$O (∣ R ∣)$ — single pass, no sort
Total	$O (∣ R ∣)$ assuming $∣ L ∣ ≪ ∣ R ∣$
Space	$O (∣ L ∣)$ — proportional to the small table

Key Properties

Memory footprint proportional to the small table, not the large one
No disk spill for the common case (small probe against large historical data)
Highly parallelizable — the streaming scan distributes naturally
DuckDB makes this configurable: PRAGMA asof_loop_join_threshold = N

When It Wins

Dramatically faster for skewed sizes. DuckDB’s benchmarks showed the loop join plan avoiding disk spills entirely for 64 rows × 1 billion rows, where sort-merge ground to a halt.

When It Loses

When both tables are large and similarly sized — the loop join degrades because it compares each right row against all left rows (effectively $O (∣ L ∣ \times ∣ R ∣)$ ). The sort-merge approach is better for balanced sizes.

Summary: Choosing a Strategy

Strategy	Best For	Time	Space	Requires Pre-sort?
Sort-Merge	Balanced table sizes	$O (n lo g n)$	$O (n + m)$	No
Hash + Binary Search	Right table fits in memory	$O (n lo g k)$	$O (n)$	No
Streaming Merge	Pre-sorted time-series	$O (n + m)$	$O (1)$	Yes
Loop Join	Small probe, huge build	$O (m)$	$O (probe)$	No

The sort-merge algorithm is explored interactively in the ASOF join notebook, where you can step through the two-pointer merge on synthetic trades/quotes data and compare it against the naive inequality join.

Edmondo's Vault

Explorer

ASOF Join — Implementation Strategies

ASOF Join — Implementation Strategies

Sort-Merge ASOF Join

Mechanism

Complexity

When It Wins

When It Loses

Hash + Binary Search

Mechanism

What’s different from sort-merge

Complexity

When It Wins

When It Loses

ClickHouse Constraints

Streaming Merge

Mechanism

Complexity

QuestDB Syntax

When It Wins

When It Loses

Loop Join Optimization (DuckDB, Feb 2025)

The Problem with Sort-Merge for Skewed Sizes

Mechanism

Complexity

Key Properties

When It Wins

When It Loses

Summary: Choosing a Strategy

See also

Graph View

Table of Contents

Backlinks