Revision Cheat Sheet

By Pritesh Yadav June 21, 2026 4 min read —

One-line everything: Consensus = agree on one ordered log of commands, using a majority quorum, so identical replicated state machines stay in lockstep despite crashes and partitions.

The Consensus Requirements

Agreement — no two correct nodes decide differently (safety).
Validity — the decided value was actually proposed by someone (safety).
Termination — every correct node eventually decides (liveness).
FLP — in a fully async network with crashes, no deterministic algorithm guarantees all three; real systems use timeouts + randomness to get practical liveness, never sacrificing safety.

Quorum / Majority Rule

Quorum = strict majority = ⌊n/2⌋ + 1.
Tolerate f failures → need 2f+1 nodes (3 tolerate 1, 5 tolerate 2).
Safety comes from overlap: any two majorities share ≥1 node, so two conflicting decisions can't both win.
Use odd node counts — an even extra node raises the threshold without adding fault tolerance.

Raft — Server States

State	Role
Follower	Passive; responds to RPCs; starts election if heartbeat times out.
Candidate	Bumps term, votes for self, requests votes; wins on majority.
Leader	Handles all client writes; sends heartbeats; replicates the log.

Raft — The Two RPCs

RPC	Sent by	Purpose	Granted/accepted when
RequestVote	Candidate	Win an election	Voter hasn't voted this term AND candidate's log is ≥ as up-to-date.
AppendEntries	Leader	Replicate entries / heartbeat (empty)	Term is current AND prev entry's index+term match (log-matching check).

Raft — Key Properties

Election safety — at most one leader per term.
Log matching — same index+term ⇒ logs identical up to that point.
Leader completeness — a committed entry exists in every future leader's log.
Committed = replicated to a majority by the current leader ⇒ durable, never lost.
Membership changes via joint consensus (old+new majorities both required) to avoid split-brain; snapshots compact the log and catch up laggards.

Paxos — Two Phases (Single-Decree)

Phase	Proposer asks	Acceptor does
1: Prepare / Promise	"Promise to ignore ballots < N"	Promises; returns any value already accepted.
2: Accept / Accepted	"Accept value V at ballot N"	Accepts if it hasn't promised a higher N. Majority accept ⇒ chosen.

Paxos safety rule: if any acceptor already accepted a value, a new proposer must re-propose that same value (the highest-ballot one it learns about in Phase 1) — this is what prevents two different values being chosen.

Roles: Proposer (drives), Acceptor (votes), Learner (finds out the result).
Multi-Paxos: elect a stable leader once, skip Phase 1 for later slots ⇒ steady-state commit = one round trip (just like Raft).

Raft vs Paxos — One-Liners

Same problem, same majority foundation; both reduce to agreeing on a log.
Raft = prescriptive, single strong leader, structured log, built for understandability.
Paxos = minimal, general, leaderless in theory; leaves leader election + multi-decision as DIY.
Multi-Paxos and Raft converge on "stable leader + replicated log" in practice.
Term (Raft) = Epoch = Ballot (Paxos) — the increasing number that exposes stale leaders.
Both are crash-fault tolerant only — neither handles Byzantine/malicious nodes (use PBFT for that).

Real-World Systems

System	Algorithm	Used for
etcd	Raft	Kubernetes' config/state store.
Consul	Raft	Service discovery, config, locks.
CockroachDB / TiKV	Raft (per shard)	Distributed SQL/KV replication.
ZooKeeper	ZAB (Raft-like)	Coordination, locks, leader election.
Google Spanner / Chubby	(Multi-)Paxos	Globally-distributed DB; lock service.
PBFT / blockchains	BFT consensus	Agreement among untrusted/malicious nodes.

Rules of Thumb

Use 3 nodes for "survive one failure," 5 for "survive two / maintenance + a failure." Beyond 7, write throughput suffers — shard into multiple groups instead.

Don't roll your own consensus. Use a proven Raft library (etcd/Hashicorp). Subtle bugs in election or commit logic cause silent data loss that tests rarely catch.

Every decision costs at least one majority round trip — budget that latency. Cross-datacenter clusters pay WAN latency on every write; keep the quorum geographically tight or accept the cost.

Partition ⇒ only the majority side makes progress; the minority pauses (consistency over availability).
Scale reads with follower reads + leader leases; scale writes by sharding into many consensus groups.
Cap the log with snapshots; bootstrap new nodes from a snapshot, not full replay.

Continue reading

🌐 Distributed Systems

How to Use This Guide

🌐 Distributed Systems

The Consensus Problem

🌐 Distributed Systems

Replicated State Machines & the Log