Lateral Joins and Views

Core Concept

A LATERAL JOIN allows a subquery to reference columns from preceding tables in the FROM clause. Unlike a regular join, which operates on independent datasets, a LATERAL JOIN enables correlated subqueries, where the inner query depends on values from the outer query. This makes it particularly useful for nested computations and cases where results need to be filtered dynamically.

Traditional SQL only permits correlated subqueries in the WHERE and SELECT clauses — the FROM clause items must be evaluated independently. The LATERAL keyword explicitly opts out of this restriction.

Lateral Joins in PostgreSQL

PostgreSQL fully supports LATERAL JOIN, providing a mechanism to filter or limit rows per group dynamically. The syntax involves the LATERAL keyword within a JOIN statement:

SELECT d.name AS department, e.name AS employee, e.salary
FROM Department d
JOIN LATERAL (
    SELECT name, salary
    FROM Employee e
    WHERE e.departmentId = d.Id
    ORDER BY salary DESC
    LIMIT 3
) AS e ON TRUE;

The inner query references d.Id, which comes from the outer Department table. The ON TRUE condition is used because a lateral join does not require an explicit join condition like traditional joins. PostgreSQL evaluates the inner query once per row in Department.

A lateral join is particularly useful when selecting the top N rows per group, dynamically filtering arrays, or applying aggregations that require access to prior query results. Since it allows row-by-row evaluation, it can outperform window functions like RANK() and DENSE_RANK() when retrieving only a few records per group.

Lateral View vs Lateral Join

Despite sharing the LATERAL keyword, LATERAL VIEW and LATERAL JOIN serve fundamentally different purposes.

LATERAL VIEW (Hive, Spark, Databricks) is syntactic sugar for table-generating functions (UDTFs) like explode(). It flattens nested structures (arrays, maps) into rows. There is no correlated subquery — the operation is purely structural transformation.

SELECT employee_id, skill
FROM employees
LATERAL VIEW explode(skills) AS skill;

LATERAL JOIN (PostgreSQL, Spark SQL, Presto) enables correlated subqueries in the FROM clause, where the inner query can reference and filter based on outer query columns.

SELECT e.id, latest.salary
FROM employees e,
LATERAL (
    SELECT salary FROM salaries s
    WHERE s.employee_id = e.id
    ORDER BY s.date DESC
    LIMIT 1
) AS latest;

The confusion arises because UNNEST (the SQL-standard equivalent of explode) requires lateral semantics — it references a column from the outer row. Presto and Spark infer this automatically for CROSS JOIN UNNEST, so no explicit LATERAL keyword is needed.

Feature	LATERAL JOIN	LATERAL VIEW
Primary purpose	Correlated subqueries	Array/map expansion
Supports filtering, ORDER BY, LIMIT	✅ Yes	❌ No
References outer columns	✅ Yes (arbitrary expressions)	✅ Yes (single column to expand)
Engines	PostgreSQL, Spark SQL, Presto	Hive, Spark, Databricks

Decorrelation

Query optimizers traditionally decorrelate correlated subqueries by rewriting them as joins. For example:

-- Correlated subquery
SELECT * FROM orders o
WHERE o.total > (SELECT AVG(total) FROM orders WHERE customer_id = o.customer_id)
 
-- Decorrelated equivalent
SELECT o.*
FROM orders o
JOIN (
    SELECT customer_id, AVG(total) AS avg_total
    FROM orders GROUP BY customer_id
) agg ON o.customer_id = agg.customer_id
WHERE o.total > agg.avg_total

Decorrelation converts per-row evaluation into a single bulk operation followed by a join, which is significantly more efficient for large datasets.

Limits of Decorrelation

LATERAL subqueries with LIMIT, ORDER BY on outer references, or row-dependent control flow often cannot be decorrelated. The canonical example is “top N per group”:

SELECT d.name, e.*
FROM departments d
JOIN LATERAL (
    SELECT * FROM employees
    WHERE department_id = d.id
    ORDER BY salary DESC
    LIMIT 3
) e ON TRUE;

The LIMIT 3 must apply per department — this semantic cannot be expressed as a single decorrelated join. The optimizer must either execute the subquery per row or rewrite using window functions:

SELECT d.name, e.*
FROM departments d
JOIN (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rn
    FROM employees
) e ON e.department_id = d.id
WHERE e.rn <= 3;

This is not decorrelation in the traditional sense — it is a semantic-preserving rewrite that changes the execution strategy. Recent research (Neumann & Kemper’s work on the HyPer engine) has extended decorrelation techniques to handle more complex patterns, but LIMIT-per-group remains fundamentally resistant to classic decorrelation because the cardinality constraint is inherently row-dependent.

Important

When a LATERAL subquery contains LIMIT or OFFSET dependent on outer values, the optimizer cannot decorrelate. Expect nested loop execution with per-row subquery evaluation, which scales poorly on large outer tables.

Performance Considerations

Lateral joins can be expensive because they execute subqueries multiple times — once per outer row. PostgreSQL optimizes lateral joins with indexing and efficient query planning, but excessive use can lead to performance bottlenecks. In contrast, Spark’s approach using distributed processing and broadcast joins often provides better performance for large-scale queries.

Choosing between LATERAL JOIN, window functions, and alternative approaches depends on the use case. For “top N per group” problems, window functions with ROW_NUMBER() are generally preferred in distributed engines like Spark, while PostgreSQL’s lateral joins excel when the outer table is small and inner queries benefit from index lookups.

Edmondo's Vault

Explorer

Lateral Joins and Views

Core Concept

Lateral Joins in PostgreSQL

Lateral View vs Lateral Join

Decorrelation

Limits of Decorrelation

Performance Considerations

Graph View

Table of Contents

Backlinks