TODO:

  • Reviews types in Postgres and Parquet

1. SQL Performance Tuning & Optimization

  • Indexing (Clustered vs. Non-clustered, covering, composite) ✅ 2025-02-14
  • Query Execution Plans (EXPLAIN/EXPLAIN ANALYZE) ✅ 2025-02-14
  • Common performance bottlenecks (e.g., full table scans, Cartesian joins, correlated subqueries) ✅ 2025-02-21
  • Partitioning & Clustering (horizontal vs. vertical partitioning) ✅ 2025-02-15

2. Window Functions & Advanced SQL

  • ROW_NUMBER(): Assigns a unique number to each row starting from 1. ✅ 2025-02-16
  • RANK(): Assigns the same rank to identical values, with gaps in subsequent ranks. ✅ 2025-02-16
  • DENSE_RANK(): Similar to RANK(), but without gaps. ✅ 2025-02-16
  • LEAD() and LAG(): Access subsequent or preceding row values. ✅ 2025-02-17
  • FIRST_VALUE() and LAST_VALUE(): Get the first or last value in a partition.
  • NTH_VALUE(): Fetches the Nth value in a window.
  • NTILE(): Divides the partition into a specified number of ranked groups.

3. Data Warehousing Principles (Kimball & Inmon)

  • Star Schema vs. Snowflake Schema (pros/cons) ✅ 2025-02-15
  • Fact vs. Dimension Tables ✅ 2025-02-15
  • Slowly Changing Dimensions (SCD) (Types 1, 2, 3) ✅ 2025-02-27
  • Denormalization vs. Normalization trade-offs ✅ 2025-02-15
  • Data Lake vs. Data Warehouse vs. Data Mart ✅ 2025-02-15

4. Data Partitioning & Sharding

  • Logical vs. Physical Partitioning ✅ 2025-02-15
  • Shard keys selection (impact on query performance) ✅ 2025-02-15
  • Distributed query processing (e.g., how Trino/Presto optimize queries across partitions) ✅ 2025-02-15

5. Data Modeling for Analytics

  • Best practices in OLAP vs. OLTP modeling
  • Schema evolution strategies (how to change table structures without breaking things) ✅ 2025-02-22
  • Handling event-based data (e.g., logs, clickstream data) ✅ 2025-02-22
  • Postgres and Parquet data Types ✅ 2025-02-22

6. ETL/ELT Best Practices

  • Batch vs. Streaming ETL (Airflow vs. Flink) ✅ 2025-02-15
  • Data ingestion pipelines (e.g., CDC, S3-based ingestion) ✅ 2025-02-15
  • Handling late-arriving and duplicate data ✅ 2025-02-22

7. Time-Series & Incremental Data Processing

  • Best practices for time-based partitioning (e.g., hourly/daily tables)
  • Retention strategies (e.g., hot/cold storage)
  • Incremental vs. full refresh strategies in data pipelines

8. Common SQL Anti-Patterns

  • N+1 query problem (e.g., how JOIN performance can degrade) ✅ 2025-02-21
  • Unnecessary DISTINCTs & GROUP BYs ✅ 2025-02-21
  • Self-joins & excessive subqueries ✅ 2025-02-21

9. Query Engines & Data Infra Basics

  • Trino/Presto vs. Spark SQL vs. BigQuery vs. Redshift
  • Columnar vs. Row-based storage (Parquet vs. ORC vs. Avro) ✅ 2025-02-1
  • Z-Ordering & File Layout Strategies (Databricks, Iceberg, Delta Lake) ✅ 2025-02-2 -
  • Materialized Views & Caching Strategies

10. Data Governance & Quality

  • Schema evolution (managing breaking changes) ✅ 2025-02-22
  • Data consistency strategies (e.g., eventual vs. strong consistency in a data warehouse)
  • Auditing, lineage tracking (e.g., OpenLineage, DBT)

Leetcode questions

Recursive CTE

  • LC 1336. Number of Transactions per Visit ✅ 2025-02-16

  • LC 1384. Total Sales Amount by Year ✅ 2025-02-17

  • LC 1635. Hopper Company Queries I ✅ 2025-02-17

  • LC 1645. Hopper Company Queries II ✅ 2025-02-17

  • LC 1651. Hopper Company Queries III ✅ 2025-02-17

  • LC 1767. Find the Subtasks that Did Not Execute

  • LC 2153. The Number of Passengers in Each Bus II *** ✅ 2025-02-17 Window Functions

  • LC 185. Department Top 3 Salaries ✅ 2025-02-16

  • LC 1159. Market Analysis II ✅ 2025-02-16

  • LC 1369. Get the Second Most Recent Activity ✅ 2025-02-16

  • LC 1412. Find the Quiet Students in All Exams ✅ 2025-02-16

  • LC 1972. First & Last Call on the Same Day ✅ 2025-02-16

  • LC 2010. The Number of Seniors & Juniors to Join the Company II ✅ 2025-02-16

  • LC 2362. Generate the Invoice ✅ 2025-02-16

  • LC 2720. Popularity Percentage ✅ 2025-02-16

  • LC 2793. Status of Flight Tickets ✅ 2025-02-16

  • LC 2995. Viewers Turned Streamers ✅ 2025-02-16

  • LC 1194. Tournament Winners *** ✅ 2025-02-16

  • LC 579. Find Cumulative Salary of an Employee ✅ 2025-02-16

  • LC 601. Human Traffic of Stadium ✅ 2025-02-16

  • LC 1651. Hopper Company Queries III ✅ 2025-02-17

  • LC 2494. Merge Overlapping Events in the Same Hall *** ✅ 2025-02-27 Min-max for pivoting

  • LC 618. Students Report by Geography

  • LC 2991. Top Three Wineries

Special tricks - Gaps and Island

  • LC 601. Human Traffic of Stadium ✅ 2025-02-17
  • LC 1225. Report Contiguous Dates ✅ 2025-02-27
  • LC 2173. Longest Winning Streak ✅ 2025-02-17
  • LC 2701. Consecutive Transactions with Increasing Amounts ✅ 2025-02-27
  • LC 2752. Customers with Maximum Number of Transactions on Consecutive Days ✅ 2025-02-27

Group concat

  • LC 2118. Build the Equation
  • LC 2199. Finding the Topic of Each Post
  • LC 2991. Top Three Wineries