How to Use This Guide

By Pritesh Yadav 3 min read

Consensus is the problem of getting a group of separate computers to agree on one value — one decision, one order of events — even though some of them may crash, restart, or fall out of touch at the worst possible moment. It sounds small. It is the single hardest problem in distributed systems, because the network can lose, delay, or reorder messages, and a machine that has gone quiet is indistinguishable from one that is merely slow. Solve consensus reliably and you can build a system that stays correct and available while its parts fail underneath you. That is why consensus is the crown jewel of the field: databases, lock services, configuration stores, and schedulers all rest on it.

This is Series 2: Consensus. It assumes you have read Series 1 (Foundations) and are comfortable with the basics: nodes, messages, latency, replication, failure models, and why "just use one server" eventually breaks. We build on that — we do not re-teach it.

The six sections are deliberately ordered so each one earns the next:

  • 1. The Consensus Problem — what agreement actually requires (agreement, validity, termination), why a majority is the magic number, and why FLP says no algorithm can be perfect.
  • 2. Replicated State Machines & the Log — the universal trick: if every node applies the same commands in the same order, they stay identical. Consensus reduces to agreeing on a log.
  • 3. Raft — Leader Election — Raft's first job: pick one leader per term so there is a single source of truth for the log.
  • 4. Raft — Log Replication, Safety & Membership — how the leader copies entries, commits them by majority, guarantees safety, and changes the cluster's membership without breaking.
  • 5. Paxos — The Original Consensus Algorithm — the foundational algorithm Raft was reacting to: proposers, acceptors, and the two-phase prepare/accept dance.
  • 6. Multi-Paxos, Raft vs Paxos & the Real World — running Paxos for a stream of decisions, an honest comparison, and the systems (etcd, ZooKeeper, Spanner) that ship this for real.
Read in order on a first pass. The narrative is "problem → the log abstraction → Raft (taught as election then replication) → Paxos → the real world." If you only want a working mental model, read 1–4 (Raft) and skim 5–6. Use the Glossary as you go, the FAQ when something feels contradictory, and the Cheat Sheet to revise before an interview or a design review.

Throughout, look for key boxes (the one idea you must keep), tips (practical guidance), and warnings (the traps that bite real engineers).

Continue reading