TODO:
- Reviews types in Postgres and Parquet
1. SQL Performance Tuning & Optimization
- Indexing (Clustered vs. Non-clustered, covering, composite) ✅ 2025-02-14
- Query Execution Plans (EXPLAIN/EXPLAIN ANALYZE) ✅ 2025-02-14
- Common performance bottlenecks (e.g., full table scans, Cartesian joins, correlated subqueries) ✅ 2025-02-21
- Partitioning & Clustering (horizontal vs. vertical partitioning) ✅ 2025-02-15
2. Window Functions & Advanced SQL
- ROW_NUMBER(): Assigns a unique number to each row starting from 1. ✅ 2025-02-16
- RANK(): Assigns the same rank to identical values, with gaps in subsequent ranks. ✅ 2025-02-16
- DENSE_RANK(): Similar to RANK(), but without gaps. ✅ 2025-02-16
- LEAD() and LAG(): Access subsequent or preceding row values. ✅ 2025-02-17
- FIRST_VALUE() and LAST_VALUE(): Get the first or last value in a partition.
- NTH_VALUE(): Fetches the Nth value in a window.
- NTILE(): Divides the partition into a specified number of ranked groups.
3. Data Warehousing Principles (Kimball & Inmon)
- Star Schema vs. Snowflake Schema (pros/cons) ✅ 2025-02-15
- Fact vs. Dimension Tables ✅ 2025-02-15
- Slowly Changing Dimensions (SCD) (Types 1, 2, 3) ✅ 2025-02-27
- Denormalization vs. Normalization trade-offs ✅ 2025-02-15
- Data Lake vs. Data Warehouse vs. Data Mart ✅ 2025-02-15
4. Data Partitioning & Sharding
- Logical vs. Physical Partitioning ✅ 2025-02-15
- Shard keys selection (impact on query performance) ✅ 2025-02-15
- Distributed query processing (e.g., how Trino/Presto optimize queries across partitions) ✅ 2025-02-15
5. Data Modeling for Analytics
- Best practices in OLAP vs. OLTP modeling
- Schema evolution strategies (how to change table structures without breaking things) ✅ 2025-02-22
- Handling event-based data (e.g., logs, clickstream data) ✅ 2025-02-22
- Postgres and Parquet data Types ✅ 2025-02-22
6. ETL/ELT Best Practices
- Batch vs. Streaming ETL (Airflow vs. Flink) ✅ 2025-02-15
- Data ingestion pipelines (e.g., CDC, S3-based ingestion) ✅ 2025-02-15
- Handling late-arriving and duplicate data ✅ 2025-02-22
7. Time-Series & Incremental Data Processing
- Best practices for time-based partitioning (e.g., hourly/daily tables)
- Retention strategies (e.g., hot/cold storage)
- Incremental vs. full refresh strategies in data pipelines
8. Common SQL Anti-Patterns
- N+1 query problem (e.g., how JOIN performance can degrade) ✅ 2025-02-21
- Unnecessary DISTINCTs & GROUP BYs ✅ 2025-02-21
- Self-joins & excessive subqueries ✅ 2025-02-21
9. Query Engines & Data Infra Basics
- Trino/Presto vs. Spark SQL vs. BigQuery vs. Redshift
- Columnar vs. Row-based storage (Parquet vs. ORC vs. Avro) ✅ 2025-02-1
- Z-Ordering & File Layout Strategies (Databricks, Iceberg, Delta Lake) ✅ 2025-02-2 -
- Materialized Views & Caching Strategies
10. Data Governance & Quality
- Schema evolution (managing breaking changes) ✅ 2025-02-22
- Data consistency strategies (e.g., eventual vs. strong consistency in a data warehouse)
- Auditing, lineage tracking (e.g., OpenLineage, DBT)
Leetcode questions
Recursive CTE
-
LC 1336. Number of Transactions per Visit ✅ 2025-02-16
-
LC 1384. Total Sales Amount by Year ✅ 2025-02-17
-
LC 1635. Hopper Company Queries I ✅ 2025-02-17
-
LC 1645. Hopper Company Queries II ✅ 2025-02-17
-
LC 1651. Hopper Company Queries III ✅ 2025-02-17
-
LC 1767. Find the Subtasks that Did Not Execute
-
LC 2153. The Number of Passengers in Each Bus II *** ✅ 2025-02-17 Window Functions
-
LC 185. Department Top 3 Salaries ✅ 2025-02-16
-
LC 1159. Market Analysis II ✅ 2025-02-16
-
LC 1369. Get the Second Most Recent Activity ✅ 2025-02-16
-
LC 1412. Find the Quiet Students in All Exams ✅ 2025-02-16
-
LC 1972. First & Last Call on the Same Day ✅ 2025-02-16
-
LC 2010. The Number of Seniors & Juniors to Join the Company II ✅ 2025-02-16
-
LC 2362. Generate the Invoice ✅ 2025-02-16
-
LC 2720. Popularity Percentage ✅ 2025-02-16
-
LC 2793. Status of Flight Tickets ✅ 2025-02-16
-
LC 2995. Viewers Turned Streamers ✅ 2025-02-16
-
LC 1194. Tournament Winners *** ✅ 2025-02-16
-
LC 579. Find Cumulative Salary of an Employee ✅ 2025-02-16
-
LC 601. Human Traffic of Stadium ✅ 2025-02-16
-
LC 1651. Hopper Company Queries III ✅ 2025-02-17
-
LC 2494. Merge Overlapping Events in the Same Hall *** ✅ 2025-02-27 Min-max for pivoting
-
LC 618. Students Report by Geography
-
LC 2991. Top Three Wineries
Special tricks - Gaps and Island
- LC 601. Human Traffic of Stadium ✅ 2025-02-17
- LC 1225. Report Contiguous Dates ✅ 2025-02-27
- LC 2173. Longest Winning Streak ✅ 2025-02-17
- LC 2701. Consecutive Transactions with Increasing Amounts ✅ 2025-02-27
- LC 2752. Customers with Maximum Number of Transactions on Consecutive Days ✅ 2025-02-27
Group concat
- LC 2118. Build the Equation
- LC 2199. Finding the Topic of Each Post
- LC 2991. Top Three Wineries