π 1οΈβ£ Window Functions (OVER())
β Best for:
- Running totals, moving averages
- Ranking, row number, percentiles
- Identifying first/last occurrences in partitions
π¨ Fails when:
- You need time-based windows instead of row-based windows.
- You must ensure a minimum number of distinct entries (like the 7-day constraint in your LeetCode problem).
π‘ Example Failure Case: If a question asks:
βFind the highest daily sales per product for the last N days.β You might instinctively try:
SELECT product_id, sale_date,
MAX(sale_amount) OVER (PARTITION BY product_id ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS max_sales
FROM sales;π¨ Problem: If some days have no sales, the window will include fewer than 7 actual days!
π‘ Fix: Use a self-join instead, ensuring exactly N distinct days.
π 2οΈβ£ Self-Joins
β Best for:
- Time-based windows (e.g., βlast N daysβ)
- Comparing records to earlier versions
- Finding sequential patterns (e.g., consecutive logins)
π¨ Fails when:
- Your dataset is too large, causing performance issues.
π‘ Example Failure Case:
βFind the number of consecutive logins for each user.β You might start with:
SELECT a.user_id, a.login_date,
COUNT(b.user_id) AS consecutive_logins
FROM logins a
JOIN logins b ON b.login_date BETWEEN a.login_date - INTERVAL '6 days' AND a.login_date
GROUP BY a.user_id, a.login_date;π¨ Problem: This doesnβt properly account for gaps in logins.
π‘ Fix: Use window functions with LAG() to detect gaps.
π 3οΈβ£ Recursive Common Table Expressions (CTEs)
β Best for:
- Hierarchical queries (tree-like structures)
- Finding longest paths or connected components
- Graph traversal (e.g., finding shortest paths in an adjacency list)
π¨ Fails when:
- The recursion runs too deep (PostgreSQL has a recursion depth limit).
- You donβt actually have hierarchical data.
π‘ Example Failure Case:
βFind all employees who report (directly or indirectly) to a given manager.β You might start with:
WITH RECURSIVE hierarchy AS (
SELECT employee_id, manager_id, 1 AS depth
FROM employees WHERE manager_id = 100 -- Starting manager
UNION ALL
SELECT e.employee_id, e.manager_id, h.depth + 1
FROM employees e
JOIN hierarchy h ON e.manager_id = h.employee_id
)
SELECT * FROM hierarchy;π¨ Problem: If the data contains cycles, this will run forever.
π‘ Fix: Use a depth limit (WHERE depth < X) or cycle detection.
π 4οΈβ£ Aggregation (GROUP BY)
β Best for:
- Summarizing data
- Finding counts, averages, and sums across groups
- Deduplication using
COUNT(DISTINCT β¦)
π¨ Fails when:
- You need row-by-row operations instead of group summaries.
- You need ranking within groups (use window functions instead).
π‘ Example Failure Case:
βFind the most recent purchase date for each customer.β You might try:
SELECT customer_id, MAX(purchase_date)
FROM purchases
GROUP BY customer_id;π¨ Problem: This gives only the date, not the actual row with full details.
π‘ Fix: Use DISTINCT ON (PostgreSQL) or window functions.
π 6οΈβ£ EXISTS vs. JOINs
β Best for:
- Checking existence efficiently
- Eliminating duplicates before joining large datasets
- Improving performance vs.
IN ()for large subqueries
π¨ Fails when:
- The dataset is small (a
JOINmight be better). - You need actual row data, not just a boolean check.
π‘ Example Failure Case:
βFind all users who have made at least one purchase.β You might try:
SELECT user_id FROM users
WHERE EXISTS (
SELECT 1 FROM purchases WHERE purchases.user_id = users.user_id
);π Why is this better than a JOIN?
- Stops scanning early (as soon as one match is found).
- Prevents duplicate rows, unlike a
JOIN.
π Summary: SQL Problem-Solving Mental Model
| Situation | Best Approach |
|---|---|
| Moving averages over a row count | WINDOW FUNCTION (ROWS BETWEEN) |
| Moving averages over a time range | SELF-JOIN with BETWEEN INTERVAL |
| Aggregation by groups | GROUP BY |
| Finding hierarchical relationships | Recursive CTE |
| Checking for existence efficiently | EXISTS |
| Ranking within groups | DENSE_RANK(), RANK(), ROW_NUMBER() |