Low-level infra

  • Rate Limiter
  • Distributed Message Queue
  • Consistent Hashing
  • Distributed Unique ID Generator
  • Google Drive

Rate limiter

  • Token Bucket: Allows bursty traffic with steady refill rates.
  • Leaky Bucket: Enforces a constant request rate, dropping excess traffic.
  • Fixed Window Counter: Simple but allows bursts at window boundaries.
  • Sliding Window Counter: Smooths out boundary issues with partial window counts.
  • Sliding Window Log: Precise but memory-intensive for high traffic.

Distributed Message Queue like Kafka

  • Storage
    • Data storage: WAL and leader election
    • State storage (Consumer info)
    • Metadata storage (topics configuration)
  • Coordination service with Zookeper for service discovery
  • Producer routing in the SDK

Storage

Key-Value Store

  • Durability: Uses Write-Ahead Logs (WAL) and SSTables (Sorted String Tables) for crash recovery. Examples: RocksDB, Cassandra.

  • Replication:

    • Leader-Follower: Redis, Cassandra (default).
    • Multi-Leader: Couchbase, DynamoDB Global Tables.
    • Synchronous: etcd, Spanner (ensures strong consistency).
    • Asynchronous: Cassandra, DynamoDB (eventual consistency, better performance).
  • Quorum: Reads and writes require a majority of replicas to agree. Ensures fault tolerance while balancing consistency and availability. (No Dynamodb)

  • Temporary Inconsistency in DynamoDB: DynamoDB’s eventual consistency allows replicas to diverge briefly. A read might return stale data until all replicas sync. This improves latency and availability.

S3-like Object Storage

  • Data is stored in data nodes using a WAL and optimized files with multiple objects
  • The storage service is composed by a data routing service, a placement service, and data nodes
  • The data routing service generates UUID for new objects
  • The placement service collect heartbeats and data from data nodes and maintain a virtual cluster map (5 or 7 nodes, using Raft or Paxos for consensus)
  • The placement service uses consistent hashing to return the address of a node for get and query operation
  • Two possible ways to handle the durability:
    • Replication:
      • worse for data storage (multiple copy of the data)
      • worse durability
      • better computing power: no need to compute parities before writing to disk
      • better write performance
      • Useful for S3 frequent access
    • Erasure coding
      • Better for data storage (we break the data in pieces), and we store 4 parity fragments (1 every 2 fragments)
      • Better for durability
      • Useful for Glacier
      • Worse read: need to read all

User Engagement problems

  • Real-time Gaming Leaderboard
  • Youtube (L6+)
  • News Feed System

Financial Services

  • Digital Wallet

Stock exchange

Key ideas:

  • A Client Gateway talk to an order manager, which does risk checking, talks to a wallet, and sends via the sequencer order for matching to the matching engine
  • The sequence ID it’s important for replayability
  • The matching engine uses internally roducibilityorder books, that leverage linked lists of orders for matching, deletion and insertions
  • To maximize performance, each component such as the Order Manager is pinned to a single CPU and runs a single event loop
  • Order manager, matching engine and market data publisher run on a single server and use mmap to share portion of the ram
  • Move to event sourcing and treat order manager as a library

Payment System

Key ideas:

  • Implement pay-in flow (payment system receives money from customers on behalf of merchant)
  • Implement pay-out flow, money is transferred to sellers once delivery is confirmed
  • The payment service performs fraud checks before accepting the transaction
  • Use a payment executor to talk with PSP (stripe, paypal)
  • Use a ledger for debit and credit entries, wallet to save the current state
  • Store payment events in a database
  • Store state machine (ledger_updated, wallet_updated) in the database payment_orders
  • Hosted payment page
  • Integrating with PSP via queue means having a dead letter queue for retries / manage idempotency
  • Global consistency is not a big problem since at 10tps we can afford sync replica

Location-based Services

  • Proximity service (geo hashing, quad tree)
  • Nearby Friends
  • Google Maps (L6+)

Others

  • Web Crawler
  • Metrics Monitoring and Alerting System
  • Adclick Event Aggregation
  • Hotel Reservation System

Search autocomplete

  • Cache Common Prefixes:
    • Store precomputed results for the most frequent or likely prefixes in Redis.
    • Example: Cache results for prefixes like “c”, “ca”, “cat”.
  • Fallback to Trie:
    • If a prefix is not cached, traverse the trie to find matching terms dynamically. This evolves traversing multiple pointers (c a - >t )
  • Dynamic Cache Updates:
    • Use trie lookups to populate cache on a cache miss (lazy loading).
  • Shard by prefix
    • Different instances of the trie in different nodes

URL shortener

Biggest complexity is around determining the ID / the code for the URL. Solutions:

  • Incremental global counter, scalability problems
  • Hashing need to salt to avoid collisions
  • Random could potentially generate much longer id than required, ensure uniqueness checking in database (optimistic locking, insert if not exist)
  • Snowflake ID designed by twitter: timestamp millis, machine id, sequence number (to ensure uniqueness for uuid generated at the same millisecond on the same machine)
  • UUID v4: 128-bit identifier (one could encode it in base64)

Notification service

  • Biggest complexity is how to handle assigning a subscriber to a gateway.
  • Use consistent hashing, then let them talk websocket. If the client detects a failure, it will go again to the central server, and reconnect to a new gateway instance.
  • Store the table of subscriptions in a highly available, globally replicated database