- Parallel and distributed databases
- Parallel: single node / specialized hardware
- Distributed
- Process models
- Process per worker (old databases, Postgres)
- Thread per worker (new databases, Oracle, MySQL)
- Scheduling:
- OS (process per worker)
- DB (thread per worker)
- Application (embedded database, i.e rocksdb)
- Paralelism:
- Inter-query: multiple queries execute concurrently
- Intra-query: Execute pieces of the same query in parallel
- Intra-operator (horizontal): query plans operators are decomposed into fragments that operate on disjoint subset of data
- Exchange operators:
- Gather: combine result from workers in single output stream
- Distributed: Split single input stream into multiple output stream
- Repartition: Reorganized multiple input streams into multiple output stream
- Inter-operator parallelism (Vertical): data is pipelined without materialization, common in stream processing system
- IO parallelism:
- Multi-disk parallelism: transparent to the DBMS, using RAID configuration
- Database partitioning:
- Vertical partitioning: table attributes are stored in a separate location, like a column store
- Horizontal partitioning: subsets of the table are split in multiple files