• Parallel and distributed databases
    • Parallel: single node / specialized hardware
    • Distributed
  • Process models
    • Process per worker (old databases, Postgres)
    • Thread per worker (new databases, Oracle, MySQL)
  • Scheduling:
    • OS (process per worker)
    • DB (thread per worker)
    • Application (embedded database, i.e rocksdb)
  • Paralelism:
    • Inter-query: multiple queries execute concurrently
    • Intra-query: Execute pieces of the same query in parallel
      • Intra-operator (horizontal): query plans operators are decomposed into fragments that operate on disjoint subset of data
        • Exchange operators:
          • Gather: combine result from workers in single output stream
          • Distributed: Split single input stream into multiple output stream
          • Repartition: Reorganized multiple input streams into multiple output stream
      • Inter-operator parallelism (Vertical): data is pipelined without materialization, common in stream processing system
  • IO parallelism:
    • Multi-disk parallelism: transparent to the DBMS, using RAID configuration
    • Database partitioning:
      • Vertical partitioning: table attributes are stored in a separate location, like a column store
      • Horizontal partitioning: subsets of the table are split in multiple files