Interview list with tips

Low-level infra

Rate Limiter
Distributed Message Queue
Consistent Hashing
Distributed Unique ID Generator
Google Drive

Rate limiter

Token Bucket: Allows bursty traffic with steady refill rates.
Leaky Bucket: Enforces a constant request rate, dropping excess traffic.
Fixed Window Counter: Simple but allows bursts at window boundaries.
Sliding Window Counter: Smooths out boundary issues with partial window counts.
Sliding Window Log: Precise but memory-intensive for high traffic.

Distributed Message Queue like Kafka

Storage
- Data storage: WAL and leader election
- State storage (Consumer info)
- Metadata storage (topics configuration)
Coordination service with Zookeper for service discovery
Producer routing in the SDK

Storage

Key-Value Store

Durability: Uses Write-Ahead Logs (WAL) and SSTables (Sorted String Tables) for crash recovery. Examples: RocksDB, Cassandra.
Replication:
- Leader-Follower: Redis, Cassandra (default).
- Multi-Leader: Couchbase, DynamoDB Global Tables.
- Synchronous: etcd, Spanner (ensures strong consistency).
- Asynchronous: Cassandra, DynamoDB (eventual consistency, better performance).
Quorum: Reads and writes require a majority of replicas to agree. Ensures fault tolerance while balancing consistency and availability. (No Dynamodb)
Temporary Inconsistency in DynamoDB: DynamoDB’s eventual consistency allows replicas to diverge briefly. A read might return stale data until all replicas sync. This improves latency and availability.

S3-like Object Storage

Data is stored in data nodes using a WAL and optimized files with multiple objects
The storage service is composed by a data routing service, a placement service, and data nodes
The data routing service generates UUID for new objects
The placement service collect heartbeats and data from data nodes and maintain a virtual cluster map (5 or 7 nodes, using Raft or Paxos for consensus)
The placement service uses consistent hashing to return the address of a node for get and query operation
Two possible ways to handle the durability:
- Replication:
  - worse for data storage (multiple copy of the data)
  - worse durability
  - better computing power: no need to compute parities before writing to disk
  - better write performance
  - Useful for S3 frequent access
- Erasure coding
  - Better for data storage (we break the data in pieces), and we store 4 parity fragments (1 every 2 fragments)
  - Better for durability
  - Useful for Glacier
  - Worse read: need to read all

User Engagement problems

Real-time Gaming Leaderboard
Youtube (L6+)
News Feed System

Financial Services

Digital Wallet

Stock exchange

Key ideas:

A Client Gateway talk to an order manager, which does risk checking, talks to a wallet, and sends via the sequencer order for matching to the matching engine
The sequence ID it’s important for replayability
The matching engine uses internally roducibilityorder books, that leverage linked lists of orders for matching, deletion and insertions
To maximize performance, each component such as the Order Manager is pinned to a single CPU and runs a single event loop
Order manager, matching engine and market data publisher run on a single server and use mmap to share portion of the ram
Move to event sourcing and treat order manager as a library

Payment System

Key ideas:

Implement pay-in flow (payment system receives money from customers on behalf of merchant)
Implement pay-out flow, money is transferred to sellers once delivery is confirmed
The payment service performs fraud checks before accepting the transaction
Use a payment executor to talk with PSP (stripe, paypal)
Use a ledger for debit and credit entries, wallet to save the current state
Store payment events in a database
Store state machine (ledger_updated, wallet_updated) in the database payment_orders
Hosted payment page
Integrating with PSP via queue means having a dead letter queue for retries / manage idempotency
Global consistency is not a big problem since at 10tps we can afford sync replica

Location-based Services

Proximity service (geo hashing, quad tree)
Nearby Friends
Google Maps (L6+)

Others

Web Crawler
Metrics Monitoring and Alerting System
Adclick Event Aggregation
Hotel Reservation System

Search autocomplete

Cache Common Prefixes:
- Store precomputed results for the most frequent or likely prefixes in Redis.
- Example: Cache results for prefixes like “c”, “ca”, “cat”.
Fallback to Trie:
- If a prefix is not cached, traverse the trie to find matching terms dynamically. This evolves traversing multiple pointers (c → a - >t )
Dynamic Cache Updates:
- Use trie lookups to populate cache on a cache miss (lazy loading).
Shard by prefix
- Different instances of the trie in different nodes

URL shortener

Biggest complexity is around determining the ID / the code for the URL. Solutions:

Incremental global counter, scalability problems
Hashing → need to salt to avoid collisions
Random → could potentially generate much longer id than required, ensure uniqueness checking in database (optimistic locking, insert if not exist)
Snowflake ID designed by twitter: timestamp millis, machine id, sequence number (to ensure uniqueness for uuid generated at the same millisecond on the same machine)
UUID v4: 128-bit identifier (one could encode it in base64)

Notification service

Biggest complexity is how to handle assigning a subscriber to a gateway.
Use consistent hashing, then let them talk websocket. If the client detects a failure, it will go again to the central server, and reconnect to a new gateway instance.
Store the table of subscriptions in a highly available, globally replicated database

Edmondo's Vault

Explorer

Interview list with tips

Low-level infra

Rate limiter

Distributed Message Queue like Kafka

Storage

Key-Value Store

S3-like Object Storage

User Engagement problems

Financial Services

Stock exchange

Payment System

Location-based Services

Others

Search autocomplete

URL shortener

Notification service

Graph View

Table of Contents

Backlinks