Low-level infra
- Rate Limiter
- Distributed Message Queue
- Consistent Hashing
- Distributed Unique ID Generator
- Google Drive
Rate limiter
- Token Bucket: Allows bursty traffic with steady refill rates.
- Leaky Bucket: Enforces a constant request rate, dropping excess traffic.
- Fixed Window Counter: Simple but allows bursts at window boundaries.
- Sliding Window Counter: Smooths out boundary issues with partial window counts.
- Sliding Window Log: Precise but memory-intensive for high traffic.
Distributed Message Queue like Kafka
- Storage
- Data storage: WAL and leader election
- State storage (Consumer info)
- Metadata storage (topics configuration)
- Coordination service with Zookeper for service discovery
- Producer routing in the SDK
Storage
Key-Value Store
-
Durability: Uses Write-Ahead Logs (WAL) and SSTables (Sorted String Tables) for crash recovery. Examples: RocksDB, Cassandra.
-
Replication:
- Leader-Follower: Redis, Cassandra (default).
- Multi-Leader: Couchbase, DynamoDB Global Tables.
- Synchronous: etcd, Spanner (ensures strong consistency).
- Asynchronous: Cassandra, DynamoDB (eventual consistency, better performance).
-
Quorum: Reads and writes require a majority of replicas to agree. Ensures fault tolerance while balancing consistency and availability. (No Dynamodb)
-
Temporary Inconsistency in DynamoDB: DynamoDB’s eventual consistency allows replicas to diverge briefly. A read might return stale data until all replicas sync. This improves latency and availability.
S3-like Object Storage
- Data is stored in data nodes using a WAL and optimized files with multiple objects
- The storage service is composed by a data routing service, a placement service, and data nodes
- The data routing service generates UUID for new objects
- The placement service collect heartbeats and data from data nodes and maintain a virtual cluster map (5 or 7 nodes, using Raft or Paxos for consensus)
- The placement service uses consistent hashing to return the address of a node for get and query operation
- Two possible ways to handle the durability:
- Replication:
- worse for data storage (multiple copy of the data)
- worse durability
- better computing power: no need to compute parities before writing to disk
- better write performance
- Useful for S3 frequent access
- Erasure coding
- Better for data storage (we break the data in pieces), and we store 4 parity fragments (1 every 2 fragments)
- Better for durability
- Useful for Glacier
- Worse read: need to read all
- Replication:
User Engagement problems
- Real-time Gaming Leaderboard
- Youtube (L6+)
- News Feed System
Financial Services
- Digital Wallet
Stock exchange
Key ideas:
- A Client Gateway talk to an order manager, which does risk checking, talks to a wallet, and sends via the sequencer order for matching to the matching engine
- The sequence ID it’s important for replayability
- The matching engine uses internally roducibilityorder books, that leverage linked lists of orders for matching, deletion and insertions
- To maximize performance, each component such as the Order Manager is pinned to a single CPU and runs a single event loop
- Order manager, matching engine and market data publisher run on a single server and use mmap to share portion of the ram
- Move to event sourcing and treat order manager as a library
Payment System
Key ideas:
- Implement pay-in flow (payment system receives money from customers on behalf of merchant)
- Implement pay-out flow, money is transferred to sellers once delivery is confirmed
- The payment service performs fraud checks before accepting the transaction
- Use a payment executor to talk with PSP (stripe, paypal)
- Use a ledger for debit and credit entries, wallet to save the current state
- Store payment events in a database
- Store state machine (ledger_updated, wallet_updated) in the database payment_orders
- Hosted payment page
- Integrating with PSP via queue means having a dead letter queue for retries / manage idempotency
- Global consistency is not a big problem since at 10tps we can afford sync replica
Location-based Services
- Proximity service (geo hashing, quad tree)
- Nearby Friends
- Google Maps (L6+)
Others
- Web Crawler
- Metrics Monitoring and Alerting System
- Adclick Event Aggregation
- Hotel Reservation System
Search autocomplete
- Cache Common Prefixes:
- Store precomputed results for the most frequent or likely prefixes in Redis.
- Example: Cache results for prefixes like “c”, “ca”, “cat”.
- Fallback to Trie:
- If a prefix is not cached, traverse the trie to find matching terms dynamically. This evolves traversing multiple pointers (c → a - >t )
- Dynamic Cache Updates:
- Use trie lookups to populate cache on a cache miss (lazy loading).
- Shard by prefix
- Different instances of the trie in different nodes
URL shortener
Biggest complexity is around determining the ID / the code for the URL. Solutions:
- Incremental global counter, scalability problems
- Hashing → need to salt to avoid collisions
- Random → could potentially generate much longer id than required, ensure uniqueness checking in database (optimistic locking, insert if not exist)
- Snowflake ID designed by twitter: timestamp millis, machine id, sequence number (to ensure uniqueness for uuid generated at the same millisecond on the same machine)
- UUID v4: 128-bit identifier (one could encode it in base64)
Notification service
- Biggest complexity is how to handle assigning a subscriber to a gateway.
- Use consistent hashing, then let them talk websocket. If the client detects a failure, it will go again to the central server, and reconnect to a new gateway instance.
- Store the table of subscriptions in a highly available, globally replicated database