← All topics
🏗️

System Design

A 20-module, staff-level curriculum: from back-of-the-envelope estimation to consensus, consistency models, and real-world case studies.

21 posts · Engineering

  1. 1

    System Design — Complete Learning Curriculum

    This is a 20-lesson study plan for software developers who want to get very good at designing large, reliable computer systems — the kind that handle millions…

  2. 2

    Foundations & Back-of-the-Envelope Estimation

    How engineers estimate whether a system can handle a given number of users — before they build anything.

  3. 3

    Networking & Protocols

    How data travels between a user's browser and a server — and why some websites feel instant while others feel slow.

  4. 4

    API Design & Service Communication

    How different parts of a software system talk to each other — and how to design those conversations so they are reliable, easy to change, and hard to break.

  5. 5

    Database Internals: Storage Engines, Indexes, Transactions

    How databases physically store data, why some databases are better for reading versus writing, how indexes speed up (and sometimes slow down) queries, and how…

  6. 6

    05 — Data Modeling: SQL vs NoSQL & Polyglot Persistence

    How to decide where and how to store data in a software system — and when a traditional database is the right choice versus a modern "NoSQL" database.

  7. 7

    06 — Caching Deep Dive

    Caching — a technique where your software saves a copy of information in a very fast place so it does not have to look it up from the slow database every…

  8. 8

    07 · Load Balancing, Scaling & Stateless Design

    How a website that runs on one computer grows into a fleet of many computers that can handle big crowds without breaking.

  9. 9

    08 — Replication & Partitioning (Sharding)

    Two big ideas that every large website relies on: keeping copies of your data on multiple computers (so the site stays up if one breaks), and splitting a huge…

  10. 10

    09 — CAP, PACELC & Consistency Models

    A fundamental problem every database faces: when computers storing the same data are spread across different locations, what happens when the connection…

  11. 11

    10 — Consensus & Distributed Coordination

    How a group of computers can agree on one correct answer — even when some of them crash, slow down, or lose messages.

  12. 12

    Module 11 — Distributed Transactions, Sagas & Idempotency

    What happens when a software system needs to save data in two different places at the same time — and why that is much harder than it sounds.

  13. 13

    12 · Messaging, Queues & Event-Driven Architecture

    How software systems pass work between their parts without making everything wait for everything else.

  14. 14

    13 — Event Sourcing & CQRS

    Two related ideas for storing data in software: "Event Sourcing" (saving a history of everything that happened instead of just the current state) and "CQRS"…

  15. 15

    14 · Stream Processing & Real-Time Systems

    How computer systems can react to information the moment it arrives — instead of waiting until the end of the day to run a report.

  16. 16

    15 — Probabilistic Data Structures & Algorithms at Scale

    Clever shortcuts that let big software systems work fast without using too much memory. Instead of tracking every exact detail, these tools keep a tiny, rough…

  17. 17

    16 — Rate Limiting, Resiliency & Fault Tolerance

    Two things every online system needs: a way to control how much traffic comes in, and a way to stay standing when something inside the system breaks.

  18. 18

    17 · Observability, SRE & Operating Systems at Scale

    How software teams know when their systems are healthy, how they find problems when something goes wrong, and how they release updates without breaking things…

  19. 19

    Module 18 — Architecture Patterns: Monolith → Microservices → Cells

    How to decide whether to split a software system into smaller, separate pieces — and how to do it safely if you do.

  20. 20

    Module 19 — Specialized Systems: Search, Geospatial, Time-Series & Analytics

    Why a standard database like PostgreSQL cannot handle every type of question efficiently, and which specialist tool to use instead.

  21. 21

    20 — Case Studies & the System Design Interview Framework

    This document is a study guide for answering "design a big system" questions in a software engineering interview.