Introduction to databases
-
Review difference between disk Blocks and OS pages ✅ 2025-03-02
-
Review Futex and spinlocks ✅ 2025-02-23
-
LRU, LRU-K and clock ✅ 2025-03-02
-
Zheap and Zedstore
-
Relallvisible and MCC - transaction visibility information ✅ 2025-03-02
-
How do join algorithm work when data is too large?
-
Review of RocksDB uniqueness compared to LevelsDB and other sorted string tables systems
-
How to implement a circular buffer? ✅ 2025-03-02
-
Print Btree advanced techniques https://github.com/amilajack/reading/blob/master/Computer_Science/Modern%20B-Tree%20Techniques.pdf
-
Lesson 13 Minute 39: What are level of buckets, why do we call parallel grace hash join
-
Lesson 14: Unclear why we want the smaller table to be the outer
-
All you wanted to know about vectorization but where afraid to ask https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf ✅ 2025-03-08
-
godbolt.org
-
Two phase parallel partition grouping ✅ 2025-03-15
-
https://blog.sofwancoder.com/try-confirm-cancel-tcc-protocol ✅ 2025-03-09
-
db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec2024.pdf ✅ 2025-03-08
-
Pratt Parsing Simple but Powerful Pratt Parsing
-
Un-nesting queries https://15799.courses.cs.cmu.edu/spring2025/papers/11-unnesting/neumann-btw2025.pdf
OLTP
- understand fastpath in postgres and quickbalance in SQLLIte (https://databass.dev/links/24) postgres/src/backend/access/nbtree/nbtinseblrt.c at bf491a9073e12ce1fc3e6facd0ae1308534df570 · postgres/postgres · GitHub
- Page 60
- SSD behavior (pages, blocks, etc)
- Squash compression benchmark link 29
- Caffeine java TinyLFU
- Linux Clock Page Sweep (link 36, link 37)
- Postgres and MySQL Wal (link 39 and 40)
- View serializable, conflict serializable
- Half-split in B-link trees
Advanced databases CMU
- Roaring Bitmaps Review
- Bitweaving
- Gandiva
- CodecDB
- Apache CarbonData
- Arrow Feathers
- SBoost Parquet
- Review of Parquet official website ✅ 2025-03-15
Follow up of paper on columnar object store
- Review why R/D is more expensive than List Offsets ✅ 2025-03-07
- CodecDB paper: encoding selection when dictionary encoding fails
- Column indexes and range filters
- Zarr
Open stuff
- https://lakesail.com/blog/sql-parser-in-one-week/
- Computability theory Site Unreachable
AWS Reinvent talk
- Pick DSQL and do not focus about scale
- Almost all your Postgres code will work
- Controversial decisions
- MVCC
- ACID is a Bad pun from an old paper but you didn’t say anything about OLTP
- The log is the database
- But querying from a log isn’t fun or efficient
- O(n) operation across all transactions (why O(n^2 across) https://youtu.be/huGmR_mi5dQ?si=-EyuG4wO2ItsSLDl&t=905
- The storage layer is not responsible for durability or concurrency control(adjudicator)
- We can store the storage in a much simpler way, since it is only doing a part
- Internal log service built at Amazon (looks like the Meta one)
- Initial veision: ADI, not ACID.
- Adjudicator manages conflicts between transcations
- Adjudicator are scaled, and there is a distributed commit protocol
- The key space for the adjudicator are partitioned in a different way they are partiitioned in storage, to reduce how often they need to work together
- Distributed commit protoco: a variant of two phase commit that is much more fault tolerant
- Architecture:
- Transaction and session router
- Query processor within Firecracker (up to 10000s, ond ifferent machines in the fleet)
- PGBouncer
- Using firecracker is important because SQL is an arbitrary language
- Custom time distribution network in US expose via AWS sync srerver
- Used in DSQL, but available on EC2 instances
- Based on atomic clocks
- Check Firecracker, the open source micro-vm is hypervisor
- Adds note on BitCask and WiscKey