Paxos — The Original Consensus Algorithm

By Pritesh Yadav 18 min read

This is the algorithm that started it all. Paxos is a protocol that lets a group of unreliable machines agree on a single value, even when some of them crash, messages are lost or delayed, and several machines try to propose different values at the same time. "Consensus" here means exactly that: getting a set of nodes to agree on one value and never change their minds.

Paxos is famous for two opposite things at once: it is provably correct, and it is notoriously hard to understand. We are going to teach it slowly, one piece at a time. By the end you will see that the core idea is actually small — it is just packed very tightly.

5.1 History and why Paxos still matters

Leslie Lamport (the same person who gave us logical clocks and happens-before) first described Paxos in a 1990 paper called "The Part-Time Parliament." He wrote it as a joke story about a fictional Greek island called Paxos, whose parliament passed laws even though legislators kept wandering in and out of the chamber (the "parliamentarians" are the unreliable machines). The joke backfired: people found the story so confusing that almost nobody understood the algorithm hidden inside it. The paper sat largely ignored for years.

In 2001 Lamport gave up on the joke and published "Paxos Made Simple," a short, plain-English re-explanation. Its famous opening line is roughly: "The Paxos algorithm, when presented in plain English, is very simple." Many readers disagreed about the "simple" part — which is exactly why the Raft paper (the next section) was written, with the explicit goal of being understandable. But Paxos still matters enormously, because it is the theoretical bedrock. Google's Chubby lock service and Spanner database, Microsoft's Azure storage, Amazon's internal systems, and many others are built on Paxos or close relatives of it. Even systems that use Raft or ZAB instead are using ideas that Paxos established and proved correct first.

Key takeaway: Paxos is the original, mathematically-proven solution to consensus. You may never implement raw Paxos yourself, but every modern consensus system is a descendant of it. Understanding Paxos is understanding why consensus is possible at all.

5.2 The setup: roles and the goal

Before any steps, we need the cast of characters. Paxos splits the work into three roles. A "role" is just a job a machine performs — and importantly, one physical machine can play several roles at once. In real deployments every server is usually a proposer, an acceptor, and a learner all together. We separate the roles only to explain the logic cleanly.

RolePlain-English jobWhat it does
ProposerThe one who suggests a valueTries to get a value chosen. Drives the whole protocol by sending requests.
AcceptorThe fault-tolerant memory / voterVotes on proposals and remembers what it has promised and accepted, even across crashes. The acceptors are the system's persistent memory.
LearnerThe one who finds out the resultLearns which value was finally chosen, so it can act on it (e.g. apply it to a database).

The goal of single-decree Paxos ("single-decree" = deciding one value, one single decision) is precise:

  • Safety: Only a value that was actually proposed can be chosen. Only one value is ever chosen. Once a value is chosen, it is chosen forever — nobody can un-choose it or choose a different one. A learner never learns a value unless it really was chosen.
  • Liveness (best-effort): If things stabilize, some proposed value eventually does get chosen and learners find out. (As we will see, Paxos cannot guarantee this in a fully asynchronous world — that is the FLP result from the foundations guide — so it settles for "eventually, once the network behaves.")

The mechanism that makes a value "chosen" is a majority of acceptors. A majority means more than half (for 5 acceptors, that is 3). The magic property of majorities is that any two majorities must overlap in at least one acceptor. Two groups of 3 chosen from 5 must share at least one member. That single overlapping acceptor is the secret that makes Paxos safe — keep it in mind, we will use it.

Analogy: Think of the acceptors as a row of notaries. A proposer is a lawyer trying to get a contract notarized. The contract counts as "official" only when a majority of notaries have stamped the same contract. Because any two majorities of notaries share at least one person, and a notary refuses to contradict a stamp they already gave, you can never end up with two different "official" contracts.

5.3 Proposal numbers (ballots): giving every attempt a unique, ordered ID

Several proposers may be active at once, and a single proposer may retry after a failure. To keep order out of this chaos, every proposal attempt is tagged with a proposal number (also called a ballot number). Two rules govern these numbers:

  1. They are totally ordered — given any two, you can always say which is bigger. Bigger means "newer / takes priority."
  2. They are globally unique — no two proposers ever pick the same number. This is essential: if two proposals shared a number, the tie-breaking that keeps Paxos safe would fall apart.

How do you make numbers unique across machines with no shared clock? The standard trick is to build each number out of two parts: a locally-increasing counter combined with the server's own unique ID — for example the pair (counter, serverId), compared counter-first and breaking ties by serverId. Server A's numbers might be 1.A, 2.A, 3.A while server B's are 1.B, 2.B, 3.B; they can never collide because the serverId differs. A simpler-to-picture version: give proposer A only odd numbers and proposer B only even numbers. Either way, each proposer can always pick a number higher than any it has seen, and no two numbers are ever equal.

Common mistake: Thinking the proposal number is the value being agreed on. It is not. The number is just a priority tag on an attempt. The value is separate data (e.g. "set x = 7"). One value can be carried by many proposals with different numbers; the number decides whose attempt wins, not what gets stored.

5.4 Phase 1 — Prepare / Promise

Paxos runs in two phases. Phase 1 is about claiming priority and discovering history. The proposer is essentially asking, "Can I have a turn at number n, and by the way, has anything already been agreed that I need to honor?"

Phase 1a — Prepare (proposer → acceptors). The proposer picks a brand-new proposal number n (higher than any it has used) and sends a Prepare(n) message to at least a majority of acceptors. Note: at this point it sends no value yet — only the number.

Phase 1b — Promise (acceptor → proposer). When an acceptor receives Prepare(n):

  • If n is greater than every proposal number this acceptor has already responded to, it replies with a Promise. The Promise is a binding commitment: "I promise I will never again accept any proposal numbered less than n." It is effectively raising the floor.
  • Crucially, along with the promise the acceptor reports back any value it has already accepted — specifically the value, tagged with the proposal number under which it accepted it: it returns (accepted_number, accepted_value), or "nothing" if it has accepted no value yet. This is how already-agreed history travels forward.
  • If n is not higher than something it has already promised, the acceptor ignores the request (or sends a "no" / NACK so the proposer can give up faster).
  • The acceptor must write this promise to stable storage (disk) before replying, so that if it crashes and restarts it still remembers the floor it raised. The acceptors are the durable memory of the system.
PHASE 1: Prepare / Promise   (proposer P, acceptors A1 A2 A3)

   P                 A1            A2            A3
   |---Prepare(n)--->|             |             |
   |---Prepare(n)------------------>|            |
   |---Prepare(n)------------------------------->|
   |                 |             |             |
   |   (each acceptor: is n > my highest seen number?)
   |   (A1,A2,A3 had accepted nothing yet -> promise, report "none")
   |                 |             |             |
   |<--Promise(n, none)|           |             |
   |<--Promise(n, none)-------------|             |
   |<--Promise(n, none)---------------------------|
   |
   |  P heard from a MAJORITY (3 of 3). It may proceed to Phase 2.
   |  No prior value reported -> P is free to use its OWN value.

5.5 Phase 2 — Accept / Accepted

Phase 2 is where an actual value gets pinned down. The proposer now knows it has the floor (a majority promised) and knows the relevant history (whatever values came back in the promises).

Phase 2a — Accept (proposer → acceptors). If the proposer received Promises from a majority of acceptors, it now chooses which value to push, using one strict rule:

  • Look at all the Promises that came back. If any of them reported an already-accepted value, the proposer must use the value attached to the highest accepted_number among them. It does not get to use its own value — it is forced to carry forward the most recent already-accepted value.
  • Only if none of the majority reported any accepted value is the proposer free to propose its own value.

Call the chosen value v. The proposer sends Accept(n, v) to the acceptors.

Phase 2b — Accepted (acceptor → proposer/learners). When an acceptor receives Accept(n, v), it accepts it unless it has, in the meantime, already promised some higher number than n. In plain terms: "I'll accept this as long as nobody newer than you came along while you were gone." On accepting, it records (n, v) durably and notifies the learners. When a majority of acceptors have accepted (n, v), the value v is chosen — permanently. Learners that hear a majority of Accepted messages know the decision and can act on it.

PHASE 2: Accept / Accepted   (continuing from Phase 1)

   P                 A1            A2            A3
   |  P picks v = its own value (no prior value was reported).
   |---Accept(n, v)->|             |             |
   |---Accept(n, v)---------------->|            |
   |---Accept(n, v)------------------------------>|
   |                 |             |             |
   |   (each: have I promised anything > n? No -> accept (n,v), save to disk)
   |                 |             |             |
   |<--Accepted(n,v)-|             |             |
   |<--Accepted(n,v)----------------|             |
   |<--Accepted(n,v)------------------------------|
   |
   |  MAJORITY accepted (n,v)  ==>  v is CHOSEN, forever.
   |  Learners that see a majority of Accepted(n,v) now know: the answer is v.
Example: Five acceptors A1–A5. Proposer P (a config server) wants to set the cluster leader to "node-7".
(1) P sends Prepare(5.P) to all five. A1, A2, A3 reply Promise(5.P, none) — that is a majority of 3, none has accepted anything.
(2) Since no prior value came back, P is free to use its own value, so it sends Accept(5.P, "node-7").
(3) A1, A2, A3 had promised nothing higher than 5.P, so they accept and persist (5.P, "node-7") and tell the learners. Three of five accepted — a majority — so "node-7" is now the chosen leader, irrevocably, even if P crashes one millisecond later.

5.6 Why Paxos is SAFE — the heart of the algorithm

This is the part worth reading twice. Everything above exists to make one guarantee true: once a value could possibly have been chosen, every future proposal is forced to propose that same value. So the system can never choose two different values. Here is the chain of reasoning, built from two facts you already have.

Fact 1 — Majorities overlap. Suppose value v was chosen at proposal number n. By definition that means a majority of acceptors — call it set M1 — accepted (n, v).

Fact 2 — A later proposal must talk to a majority too. Now some proposer comes along later with a higher number n' (where n' > n). To get anywhere it must complete Phase 1: it must collect Promises from its own majority — call it set M2. Because any two majorities overlap, M1 and M2 share at least one acceptor. Call that shared acceptor X.

The punchline. Acceptor X is in M1, so it accepted (n, v). When the later proposer asks X in Phase 1, X reports back the value it accepted — (n, v) — inside its Promise. Now the Phase 2a rule kicks in: the later proposer is forced to use the highest-numbered accepted value it heard about. X just told it about v at number n. So the later proposer cannot use a value of its own — it must re-propose v. Therefore proposal n' also proposes v. The same argument applies to n'', n''', and so on forever. Every proposal numbered higher than n proposes v. No second value can ever be chosen.

The subtle bit that closes the last hole: how can the later proposer be sure it learned about a chosen value, if maybe only some of M1 had accepted by then? The answer is the promise. When X promised not to accept anything below n', it also locked the door: any proposal between n and n' can no longer slip a different value past X. So either the later proposer sees v directly, or no competing value could have reached a majority through X in the first place. The "promise" (raise the floor) and the "adopt the highest accepted value" rule work as a pair: the promise blocks the past from changing, and the adopt-rule forces the future to inherit it.

Key takeaway: Paxos safety = majority overlap + acceptors remember + every new proposer must adopt the highest already-accepted value it learns about. Together these guarantee that the very first value that reaches a majority becomes the permanent decision, and no later, higher-numbered proposal can override it.
Common mistake: Believing the proposer with the highest number "wins" by getting its preferred value chosen. A higher number wins the right to drive the protocol, but it usually does not get to install its own value — if a value was already accepted anywhere in its majority, it is forced to carry that existing value forward. Higher number = right to proceed, not right to overwrite.

5.7 The liveness problem — dueling proposers and FLP

Paxos is always safe, but it is not always guaranteed to finish. The classic failure mode is dueling proposers (also called livelock — the system is busy but makes no progress). Picture two proposers, P1 and P2, each retrying with ever-higher numbers, perfectly out of step:

DUELING PROPOSERS (livelock)   P1 and P2 over acceptors A1 A2 A3

P1: Prepare(1) ----------------------> A1 A2 A3  promise(1)
P2:        Prepare(2) ---------------> A1 A2 A3  promise(2)   (raises floor to 2)
P1: Accept(1, x) --------------------> A1 A2 A3  REJECT  (already promised 2 > 1)
P1:               Prepare(3) --------> A1 A2 A3  promise(3)   (raises floor to 3)
P2: Accept(2, y) --------------------> A1 A2 A3  REJECT  (already promised 3 > 2)
P2:                     Prepare(4) --> A1 A2 A3  promise(4)
P1: Accept(3, x) --------------------> A1 A2 A3  REJECT  (already promised 4 > 3)
   ... and so on, forever. Each Prepare invalidates the other's pending Accept.
   No value is ever chosen, even though nothing crashed.

Each proposer's Phase 1 keeps raising the floor and knocking out the other's Phase 2 before it can complete. Nobody crashed, no message was lost — yet no progress is made. This is not a bug in Paxos; it is the FLP impossibility result from the foundations guide showing up in real life: in a fully asynchronous system (no bound on message delays, no perfect failure detection), no consensus algorithm can guarantee it always terminates. Paxos chooses to sacrifice guaranteed liveness to keep absolute safety. Safety is never violated; only progress can stall.

The standard fix has two parts:

  • Elect a single distinguished proposer (a "leader"). Agree — informally, by a weak election that doesn't need to be perfect — that only one proposer normally issues proposals. With just one proposer, there is nobody to duel with, so Phase 1 and Phase 2 complete cleanly. Safety does not depend on the election being correct: if two leaders accidentally exist, Paxos is still safe (two values can never be chosen) — you only risk a temporary stall.
  • Randomized / exponential backoff. When a proposer's attempt is rejected, it waits a random amount of time before retrying (and the wait grows on repeated failures). Randomness desynchronizes the duelists so one of them gets a clean window to finish both phases. This is the same trick Ethernet and Raft elections use to break symmetric ties.

A common optimization, Multi-Paxos, takes the leader idea further: a stable leader runs Phase 1 just once to claim a range of decisions, then only does the cheap Phase 2 for each new value. This makes the steady state one round-trip per decision — and it is essentially the shape Raft formalizes with a strong leader.

Analogy: Two people reach for the same door handle, both pull back politely, then both reach again — repeating forever and never getting through. That's dueling proposers. The fix is the same as in real life: appoint one person to go first (a leader), and have anyone who collides wait a random moment before trying again so they stop colliding in lockstep.

Common mistakes & misconceptions

Common mistake: Confusing the proposal number with the value. The number is a priority tag on an attempt; the value is the separate data being agreed. Many proposals with different numbers can carry the very same value.
Common mistake: Thinking the highest-numbered proposer always installs its own value. If a value was already accepted in its majority, the protocol forces it to re-propose that existing value. The high number only grants the right to proceed, not to overwrite.
Common mistake: Assuming Paxos always terminates. Dueling proposers can livelock forever (FLP in action). Paxos guarantees safety always, but liveness only under partial synchrony plus a leader and backoff.
Common mistake: Believing acceptors can be stateless or in-memory only. Acceptors are the durable memory of the system — every promise and acceptance must be written to stable storage before replying, or a crash-and-restart can break safety.
Common mistake: Thinking basic Paxos decides a whole log of commands. Single-decree Paxos decides exactly one value. Replicating a sequence of commands needs Multi-Paxos (one Paxos instance per log slot), which is a separate, larger construction.
Common mistake: Treating two simultaneous leaders as a correctness bug. Two leaders only threaten progress (a stall), never safety — Paxos can never choose two different values regardless of how many proposers run at once.

Best practices

Best practice: Always run with an odd number of acceptors (3, 5, 7). It maximizes fault tolerance per node — 5 acceptors tolerate 2 failures and still form a majority — and avoids ambiguous even-split majorities.
Best practice: Use a single distinguished leader proposer in steady state (Multi-Paxos style). It eliminates dueling, collapses the steady state to one round-trip per decision, and keeps the system live.
Best practice: Make proposal numbers unique by construction — pair a local counter with the server ID, e.g. (counter, serverId). Never rely on wall-clock time alone for uniqueness or ordering.
Best practice: Persist promises and acceptances to disk before sending the reply (write-ahead). The acceptors are the system's memory; a reply that isn't yet durable can be lost in a crash and violate safety.
Best practice: Add randomized/exponential backoff on every rejected proposal so that colliding proposers desynchronize instead of dueling in lockstep.
Best practice: Unless you have a strong reason, build on a battle-tested implementation (etcd/Raft, ZooKeeper/ZAB, a vetted Paxos library) rather than hand-rolling Paxos — the edge cases around durability, leader change, and message reordering are exactly where subtle safety bugs hide.

Section summary

  • Paxos, from Lamport's Part-Time Parliament (1990) and Paxos Made Simple (2001), is the original, provably-correct consensus algorithm and the ancestor of nearly every modern one.
  • It uses three roles — proposer (suggests), acceptor (votes and remembers durably), learner (finds out the result) — and one machine can play all three.
  • The goal of single-decree Paxos is to get a majority of acceptors to agree on exactly one value, permanently and safely.
  • Every attempt carries a unique, totally-ordered proposal number, typically built as (counter, serverId) so no two ever collide.
  • Phase 1 (Prepare/Promise): proposer sends Prepare(n); an acceptor promises to reject anything below n and reports back any value it already accepted.
  • Phase 2 (Accept/Accepted): with a majority of promises, the proposer sends Accept(n, v), where v is the highest already-accepted value it learned about — or its own value if there was none; acceptors accept unless they've promised something higher. A majority of accepts = chosen forever.
  • Safety rests on three things together: majorities always overlap, acceptors remember, and new proposers must adopt the highest already-accepted value — so the first value to reach a majority can never be overridden.
  • Paxos can livelock with dueling proposers (the FLP impossibility made concrete); it sacrifices guaranteed liveness, never safety.
  • The fix is a single distinguished/leader proposer plus randomized backoff; Multi-Paxos extends this to an efficient one-round-trip-per-decision log, the shape Raft later formalized.

Continue reading