Reciprocal Rank Fusion

The problem

You have two ranked lists from different retrieval systems — say, pgvector cosine similarity and Full-Text Search and ParadeDB BM25 (Best Matching 25 — the standard keyword relevance scoring algorithm used by Elasticsearch, Solr, and ParadeDB) keyword search. Each system returns documents ordered by relevance, but their scores live on completely different scales:

Cosine similarity: $\in [0, 1]$
BM25: $\in [0, \infty)$

You cannot simply compare or combine these scores. You need a principled way to merge the two ranked lists into a single ranking.

Naive approaches and why they fail

Raw score addition. Just sum the scores from each system. This fails because the scales are incompatible. A BM25 score of 12.7 dwarfs a cosine score of 0.93, so keyword search would dominate regardless of actual relevance.

Score normalization (min-max). Rescale each system’s scores to $[0, 1]$ before adding. This requires knowing the global min and max for each system, is sensitive to outliers (one extreme score compresses all others), and the distribution shapes may still differ (one uniform, one exponential).

Rank averaging. Average each document’s rank across lists. This ignores the magnitude of gaps between ranks. The difference between rank 1 and rank 2 might represent a massive relevance drop, while rank 50 vs 51 are nearly identical — but rank averaging treats both gaps equally.

The RRF formula

Reciprocal Rank Fusion (Cormack, Clarke & Butt, 2009) sidesteps score comparison entirely by working only with ranks:

$RRF (d) = \sum_{r \in R} \frac{1}{k + rank _{r} ( d )}$

Where:

$R$ = set of ranked lists (e.g., vector search and BM25)
$rank_{r} (d)$ = rank of document $d$ in list $r$ (1-indexed)
$k$ = smoothing constant (typically 60)

0-indexed variant
Some implementations use 0-indexed ranks with the denominator $k + rank + 1$ . This is algebraically equivalent when ranks start at 0 instead of 1. Example:
let rrf_score = 1.0 / (k as f64 + rank as f64 + 1.0);
where rank comes from enumerate() (0-indexed).

Why k = 60

The constant $k$ dampens the advantage of top ranks.

Without $k$ :

$\frac{1}{1} vs \frac{1}{2} = 2 \times difference$

With $k = 60$ :

$\frac{1}{61} vs \frac{1}{62} \approx 1.6% difference$

This prevents a single retrieval system from dominating the fused result just because it assigned rank 1. The value 60 was empirically chosen in the original paper and has held up well across diverse benchmarks.

The additive property

Documents appearing in multiple lists accumulate scores from each list. This is the key insight: agreement across retrieval methods is a strong relevance signal.

A document ranked 5th in both lists scores:

$\frac{1}{60 + 5} + \frac{1}{60 + 5} = \frac{2}{65} \approx 0.0308$

A document ranked 1st in only one list scores:

$\frac{1}{60 + 1} = \frac{1}{61} \approx 0.0164$

The document that both systems agree on wins, even though it was never the top result in either system individually.

Worked example

Two retrieval systems, five results each

Vector search results:

Rank	Document
1	Doc A
2	Doc B
3	Doc C
4	Doc D
5	Doc E

BM25 results:

Rank	Document
1	Doc C
2	Doc F
3	Doc A
4	Doc G
5	Doc B

RRF scores (using $k = 60$ , 1-indexed):

Document	Vector contribution	BM25 contribution	Total RRF score
Doc C	$1/63 = 0.01587$	$1/61 = 0.01639$	0.03226
Doc A	$1/61 = 0.01639$	$1/63 = 0.01587$	0.03226
Doc B	$1/62 = 0.01613$	$1/65 = 0.01538$	0.03151
Doc F	—	$1/62 = 0.01613$	0.01613
Doc D	$1/64 = 0.01563$	—	0.01563
Doc E	$1/65 = 0.01538$	—	0.01538
Doc G	—	$1/64 = 0.01563$	0.01563

Final ranking: C, A, B, F, G/D (tie), E

Documents appearing in both lists (C, A, B) dominate the top, even though none of them were unanimously ranked first.

Implementation pattern

The standard implementation uses a HashMap to accumulate scores:

For each ranked list, iterate over results with their rank index
For each document, add $1/ (k + rank)$ to its accumulated score in the map
Sort all documents by accumulated score descending
Truncate to the desired limit

Implementation note

A typical reciprocal_rank_fusion() function takes the raw row results from both the vector search and BM25 queries, accumulates RRF scores in a HashMap<(Uuid, i32), (f64, String, String)> keyed by (content_id, chunk_index), then sorts and truncates. The BM25 query uses ParadeDB’s @@@ operator and gracefully falls back to empty results if ParadeDB is not installed, making the system degrade to vector-only ranking.

Alternatives

Method	Pros	Cons
Convex combination $α \cdot s_{1} + (1 - α) \cdot s_{2}$ (a weighted average where $α$ and $1 - α$ are non-negative and sum to 1)	Simple, tunable	Requires score normalization and tuning $α$ per domain
Learned ranking (LambdaMART — a gradient-boosted decision tree algorithm trained to rank documents by optimising a ranking loss function on labelled query-document pairs)	Can capture complex interactions	Needs training data, expensive to maintain
Cross-encoder reranking (a model that takes the full (query, document) pair as joint input and produces a relevance score — more accurate than bi-encoders that embed query and document independently)	Highest quality	Slow (runs full model per doc), impractical for large candidate sets
RRF	No hyperparameters beyond $k$ , no training data, no score normalization	Cannot weight one system over another without modification

When to reach for something else

RRF treats all retrieval systems equally. If you know that one system is significantly more reliable than another for your domain, a weighted variant or convex combination may outperform RRF. But as a zero-configuration baseline, RRF is hard to beat.

Edmondo's Vault

Explorer

Reciprocal Rank Fusion

Reciprocal Rank Fusion

The problem

Naive approaches and why they fail

The RRF formula

Why k = 60

The additive property

Worked example

Implementation pattern

Alternatives

See also

Graph View

Table of Contents

Backlinks