πŸ“Œ 1️⃣ Window Functions (OVER())

βœ… Best for:

  • Running totals, moving averages
  • Ranking, row number, percentiles
  • Identifying first/last occurrences in partitions

🚨 Fails when:

  • You need time-based windows instead of row-based windows.
  • You must ensure a minimum number of distinct entries (like the 7-day constraint in your LeetCode problem).

πŸ’‘ Example Failure Case: If a question asks:

β€œFind the highest daily sales per product for the last N days.” You might instinctively try:

SELECT product_id, sale_date,
       MAX(sale_amount) OVER (PARTITION BY product_id ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS max_sales
FROM sales;

🚨 Problem: If some days have no sales, the window will include fewer than 7 actual days!

πŸ’‘ Fix: Use a self-join instead, ensuring exactly N distinct days.


πŸ“Œ 2️⃣ Self-Joins

βœ… Best for:

  • Time-based windows (e.g., β€œlast N days”)
  • Comparing records to earlier versions
  • Finding sequential patterns (e.g., consecutive logins)

🚨 Fails when:

  • Your dataset is too large, causing performance issues.

πŸ’‘ Example Failure Case:

β€œFind the number of consecutive logins for each user.” You might start with:

SELECT a.user_id, a.login_date,
       COUNT(b.user_id) AS consecutive_logins
FROM logins a
JOIN logins b ON b.login_date BETWEEN a.login_date - INTERVAL '6 days' AND a.login_date
GROUP BY a.user_id, a.login_date;

🚨 Problem: This doesn’t properly account for gaps in logins.

πŸ’‘ Fix: Use window functions with LAG() to detect gaps.


πŸ“Œ 3️⃣ Recursive Common Table Expressions (CTEs)

βœ… Best for:

  • Hierarchical queries (tree-like structures)
  • Finding longest paths or connected components
  • Graph traversal (e.g., finding shortest paths in an adjacency list)

🚨 Fails when:

  • The recursion runs too deep (PostgreSQL has a recursion depth limit).
  • You don’t actually have hierarchical data.

πŸ’‘ Example Failure Case:

β€œFind all employees who report (directly or indirectly) to a given manager.” You might start with:

WITH RECURSIVE hierarchy AS (
    SELECT employee_id, manager_id, 1 AS depth
    FROM employees WHERE manager_id = 100  -- Starting manager
 
    UNION ALL
 
    SELECT e.employee_id, e.manager_id, h.depth + 1
    FROM employees e
    JOIN hierarchy h ON e.manager_id = h.employee_id
)
SELECT * FROM hierarchy;

🚨 Problem: If the data contains cycles, this will run forever.

πŸ’‘ Fix: Use a depth limit (WHERE depth < X) or cycle detection.


πŸ“Œ 4️⃣ Aggregation (GROUP BY)

βœ… Best for:

  • Summarizing data
  • Finding counts, averages, and sums across groups
  • Deduplication using COUNT(DISTINCT …)

🚨 Fails when:

  • You need row-by-row operations instead of group summaries.
  • You need ranking within groups (use window functions instead).

πŸ’‘ Example Failure Case:

β€œFind the most recent purchase date for each customer.” You might try:

SELECT customer_id, MAX(purchase_date)
FROM purchases
GROUP BY customer_id;

🚨 Problem: This gives only the date, not the actual row with full details.

πŸ’‘ Fix: Use DISTINCT ON (PostgreSQL) or window functions.



πŸ“Œ 6️⃣ EXISTS vs. JOINs

βœ… Best for:

  • Checking existence efficiently
  • Eliminating duplicates before joining large datasets
  • Improving performance vs. IN () for large subqueries

🚨 Fails when:

  • The dataset is small (a JOIN might be better).
  • You need actual row data, not just a boolean check.

πŸ’‘ Example Failure Case:

β€œFind all users who have made at least one purchase.” You might try:

SELECT user_id FROM users
WHERE EXISTS (
    SELECT 1 FROM purchases WHERE purchases.user_id = users.user_id
);

πŸš€ Why is this better than a JOIN?

  • Stops scanning early (as soon as one match is found).
  • Prevents duplicate rows, unlike a JOIN.

πŸš€ Summary: SQL Problem-Solving Mental Model

SituationBest Approach
Moving averages over a row countWINDOW FUNCTION (ROWS BETWEEN)
Moving averages over a time rangeSELF-JOIN with BETWEEN INTERVAL
Aggregation by groupsGROUP BY
Finding hierarchical relationshipsRecursive CTE
Checking for existence efficientlyEXISTS
Ranking within groupsDENSE_RANK(), RANK(), ROW_NUMBER()