Raft — Leader Election

By Pritesh Yadav June 21, 2026 20 min read —

Raft is an algorithm for keeping a replicated log in agreement across a cluster of servers. A "replicated log" is just an ordered list of commands (like "set x = 3", then "set y = 9") that every server must store in the same order, so that every server ends up in the same state. The hard part is agreeing on that order when machines crash, restart, and the network drops or delays messages. That agreement problem is called consensus.

Raft was created by Diego Ongaro and John Ousterhout (Stanford, 2014) with one explicit, unusual goal: understandability. Earlier algorithms (Paxos) are famously hard to reason about, so Raft deliberately breaks the problem into three smaller, mostly independent pieces:

Leader election — pick exactly one server to be in charge. (This section.)
Log replication — the leader copies its log to everyone. (Later section.)
Safety — guarantee the leader can never overwrite already-agreed history. (Later section.)

Raft's central design choice is the strong leader: at any moment there is one leader, and all changes to the log flow only from the leader outward to the others. Followers never talk to each other about log content. This single rule removes a huge amount of complexity — but it means everything depends on reliably electing a leader, and re-electing one the instant the old leader dies. That is what this section is about.

Analogy: Think of a group writing a shared document by phone. If everyone edits and dictates at once, chaos. So the group agrees: one person is the "scribe" at a time. The scribe is the only one who writes; everyone else just listens and copies. If the scribe goes silent, the group must quickly, and without a referee in the room, pick a new scribe — and crucially, never end up with two scribes writing different things at the same time. Raft is the precise protocol for choosing that scribe.

3.1 The three server states

At every moment, each Raft server is in exactly one of three states (a "state" here just means "the role this server is currently playing"):

State	What it does	How many at once
Follower	Passive. Issues no requests on its own. Only responds to messages from leaders and candidates. Redirects clients to the leader.	Usually all-but-one
Candidate	A follower that has decided to try to become leader. It is asking the others to vote for it.	Zero or more (briefly)
Leader	Handles all client requests, owns the log, and continuously tells followers it is alive.	At most one per term

In normal, healthy operation there is exactly one leader and every other server is a follower. The candidate state exists only during the brief, occasional moment of electing a new leader.

The state machine (transitions)

A server moves between these states based on what it hears (or fails to hear) from the network. Here is the full state diagram — this is the heart of leader election:

                       times out,           times out,
   starts up           starts election      new election
       |                   +-----+            +-----+
       v                   |     |            |     |
   +--------+   no heartbeat|     v            |     v        wins votes from
   |        |  (election    |  +-----------+   |  +-----------+   majority
   | FOLLOWER|--timeout)---->|  | CANDIDATE |---+->| CANDIDATE |------------+
   |        |               |  +-----------+      +-----------+            |
   +--------+                                                              |
       ^   ^                                                               v
       |   |                                                         +--------+
       |   | discovers current leader,                              |        |
       |   | or a server/RPC with a HIGHER term                     | LEADER |
       |   +--------------------------------------------------------|        |
       |                                                            +--------+
       |   discovers a server with a higher term                        |
       +----------------------------------------------------------------+

Read the transitions as plain rules:

Follower → Candidate: a follower hears nothing from any leader for too long (its election timeout fires), so it starts an election.
Candidate → Leader: the candidate collects votes from a majority of the cluster.
Candidate → Follower: the candidate discovers a legitimate current leader (or any message carrying a higher term — defined below), so it gives up and steps down.
Candidate → Candidate: the election timed out with no winner (a split vote), so the candidate starts a fresh election.
Leader → Follower: the leader learns of a higher term than its own, meaning a newer election has happened, so it steps down immediately.

Key takeaway: Every server is a Follower, Candidate, or Leader. Followers are passive; they only become Candidates when they stop hearing from a leader. There is no central coordinator deciding who's leader — the servers self-organize entirely through these transitions.

3.2 Terms — Raft's logical clock

Before we can run an election, we need the concept of a term.

What it is: a term is a numbered period of time. Terms are numbered with consecutive integers: term 1, term 2, term 3, and so on. Each term begins with an election. If the election produces a winner, that server is leader for the rest of that term. If the election fails to produce a winner, the term ends with no leader and the next term (with a new election) begins.

The problem it solves: there is no global clock in a distributed system (you already know this). Servers can't agree on wall-clock time. But they can agree on "which election are we on?" The term number is a logical clock — a counter that only moves forward — that lets every server detect stale information. A message from an old term is obviously out of date.

  term 1        term 2              t3        term 4
+--------+  +-----------------+  +------+  +-------------+
|election|  |election  normal |  |elect.|  |elect. normal|
| -> op  |  |   ->     oper.  |  |(none)|  |  ->   oper. |
+--------+  +-----------------+  +------+  +-------------+
                                    ^
                                    |
                            split vote: term ended
                            with NO emerging leader
   ---------------------------------------------------> time (terms)

Notice term 3 (t3) is short and has no leader — that's an election that failed (a split vote). Raft handles this gracefully: it just rolls into term 4 and tries again.

Two rules make terms powerful:

At most one leader per term (the Election Safety property). A term might have zero leaders (failed election) but never two. We'll prove why below.
Every message carries the sender's term number. Every server stores its own currentTerm. When two servers communicate, they compare terms:
- If a server sees a term larger than its own currentTerm, it updates currentTerm to the larger value and, if it was a candidate or leader, immediately reverts to follower. (It has been left behind by a newer election.)
- If a server receives a request with a stale (smaller) term, it rejects the request.

Analogy: A term is like a "presidential term" with a number. Every official letter is stamped with the term number it was written in. If you receive a letter stamped "Term 4" but you thought it was still Term 2, you immediately realize you missed an election — you accept that Term 4 is current and stand down from any office you held. If someone hands you a letter stamped "Term 2" today, you ignore it as obsolete.

3.3 Heartbeats — how the leader stays in power

How do followers know the leader is still alive? Through heartbeats.

The leader periodically sends an AppendEntries RPC to every follower. (An RPC, "remote procedure call", is just a message that asks another server to run something and reply.) AppendEntries is the same RPC the leader uses to replicate log entries — but when it carries no log entries (an empty entries[] list), it functions purely as a heartbeat: a "I'm still here, stay calm" signal.

Each time a follower receives a valid AppendEntries from the current leader, it resets its election timer. As long as heartbeats keep arriving, no follower ever times out, so no follower ever starts an election, so the leader keeps its job. The leader sends heartbeats even when idle (no client requests), precisely to prevent followers from timing out.

   Leader            Follower A        Follower B        Follower C
     |                   |                 |                 |
     |--AppendEntries--->|                 |                 |   (empty = heartbeat)
     |--AppendEntries--------------------->|                 |
     |--AppendEntries------------------------------------->  |
     |                   | reset timer     | reset timer     | reset timer
     |   ...wait...      |                 |                 |
     |--AppendEntries--->|--------------------->|----------->|   (next heartbeat)
     |                   | reset timer     | reset timer     | reset timer

Key takeaway: A Raft leader holds power by continuously broadcasting that it is alive. Silence — not an explicit "I'm dying" message — is what triggers a new election. This means Raft tolerates a leader that crashes, freezes, or gets network-partitioned away: in all three cases the followers simply stop hearing heartbeats and react identically.

3.4 The election timeout — what kicks off an election

The election timeout is the amount of time a follower waits, hearing nothing from a leader, before it gives up and tries to become leader itself.

If a follower's election timeout elapses without receiving any AppendEntries from a current leader (and without granting a vote to a candidate — more on that soon), it assumes there is no viable leader and begins an election. To begin an election the follower does four things, in order:

Transitions to candidate state.
Increments its currentTerm (e.g. from 2 to 3) — starting a brand-new term.
Votes for itself (sets its votedFor to its own id, and counts one vote).
Resets its election timer and sends RequestVote RPCs, in parallel, to every other server.

The RequestVote RPC — exact fields

RequestVote is the RPC a candidate sends to ask "please vote for me as leader." Here are its exact arguments and results, straight from the Raft specification:

Direction	Field	Meaning (plain words)
Argument	`term`	The candidate's term number (the new, incremented one).
Argument	`candidateId`	Which server is requesting the vote (so the voter can record it).
Argument	`lastLogIndex`	The position (index) of the last entry in the candidate's log — how long its log is.
Argument	`lastLogTerm`	The term of that last log entry — how recent its log is.
Result	`term`	The voter's `currentTerm`, so a behind candidate can update itself.
Result	`voteGranted`	`true` means "yes, you have my vote."

The last two arguments — lastLogIndex and lastLogTerm — exist for the safety restriction we preview in §3.7. For now, just note that the candidate advertises how up-to-date its log is.

3.5 Winning the election — majority and at-most-one-vote

A candidate wins the election if it receives voteGranted = true from a majority of all servers in the cluster (not just a majority of those that replied). "Majority" means more than half. In a 5-server cluster, a majority is 3.

The voting rule on each server is simple and strict:

Each server may vote for at most one candidate per term, on a first-come-first-served basis. It records its choice in votedFor. Once it has voted in term T, it will refuse any other candidate in term T.
A server rejects a RequestVote whose term is smaller than its own currentTerm (stale candidate).
(Plus the up-to-date-log check from §3.7.)

When a candidate gathers a majority, it immediately becomes leader and starts sending heartbeats to assert its authority and stop any other elections from starting.

Why "at most one leader per term" is guaranteed

This is the elegant part. Suppose, for contradiction, two different candidates both won the same term. Each would need votes from a majority of the cluster. But any two majorities of the same set must overlap in at least one server (two groups each bigger than half can't be disjoint). That overlapping server would have had to vote twice in the same term — which the "at most one vote per term" rule forbids. Contradiction. Therefore at most one candidate can win a given term. This is the Election Safety property.

   5 servers:   S1  S2  S3  S4  S5

   Candidate A's majority:  [ S1  S2  S3 ]
   Candidate B's majority:           [ S3  S4  S5 ]
                                       ^^
                                       |
                          S3 is in BOTH — it cannot have
                          voted for both A and B in one term.
                          => one of them cannot reach a majority.

Analogy: Imagine 5 club members electing a president. Each member gets one ballot per election round and can spend it on only one candidate. To win you need 3 ballots. Two candidates can't both collect 3 of the 5 ballots, because 3 + 3 = 6 ballots but only 5 exist — at least one member would have had to vote twice, which isn't allowed. So at most one winner per round. That's exactly the quorum-overlap argument you already know, applied to votes.

A full election timeline

  S1(Leader,term2)   S2(Follower)   S3(Follower)   S4(Follower)   S5(Follower)
        |                 |              |              |              |
        |--heartbeat----->|------------->|------------->|------------->|
        X  <-- S1 CRASHES |              |              |              |
                          |  ...no heartbeats arriving...              |
                          | timer fires (S2 first)                     |
                          | term 2 -> 3, vote self, become CANDIDATE   |
                          |--RequestVote(term=3, id=S2)--------------->|  (to all)
                          |                |              |              |
                          |<--voteGranted--|<--granted----|<--granted----|
                          | got votes: S2(self)+S3+S4 = 3 of 5 = MAJORITY
                          | become LEADER of term 3
                          |--heartbeat(term=3)----------->...---------->|
                          |  everyone resets timers, recognizes S2

Sequence: leader dies → a follower times out → it becomes a candidate in a new term → it sends RequestVote to all → it collects a majority → it becomes the new leader → its heartbeats restore calm. Total time is typically a few hundred milliseconds.

3.6 Split votes and randomized timeouts

There is a third possible outcome of an election: the candidate neither wins nor loses. This is a split vote.

How it happens: if several followers time out at nearly the same instant, they all become candidates in the same new term and all start begging for votes simultaneously. The votes get divided among them, and no single candidate reaches a majority. Each candidate then sits, having voted for itself, unable to win and unwilling to vote for anyone else this term.

  5 servers. S2 and S4 time out at the SAME time -> both candidates, term 3.

     S1        S2(cand)      S3        S4(cand)      S5
      |   <--ReqVote-- |       |    --ReqVote-->|     |
      |               |  --ReqVote--> |         |  <--ReqVote--|... (crossing)

   Votes land:  S2 gets: S2(self) + S1   = 2
                S4 gets: S4(self) + S5   = 2
                S3 already voted for S2? or S4? say it voted S4 -> S4=3? 
   (In the classic split:)  S2 = S1,S2  |  S4 = S4,S5  |  S3 hasn't decided
                -> 2 vs 2, nobody reaches 3.  *** SPLIT VOTE ***
                -> term 3 ends with NO leader. Everyone waits, times out again.

Without a countermeasure, split votes could repeat forever: every candidate times out at the same time again, splits again, term after term. Raft's fix is beautifully simple — randomized election timeouts.

Instead of every server using the same fixed timeout, each server picks its election timeout randomly from a fixed range — the paper uses 150–300 ms as an example. So the timeouts are spread out. In most cases, one server times out clearly before any other, wins the whole election uncontested, and sends heartbeats before anyone else even wakes up. The same randomization is reused after a split: each candidate restarts its timer with a fresh random value before the next attempt, so the next round is very unlikely to tie again.

  Split happened in term 3. Now each picks a NEW random timeout for term 4:

    S2 random timeout = 270ms  |------------------------------|fires
    S4 random timeout = 165ms  |----------------|fires  <-- WINS THE RACE
                                                 |
                          S4 times out first, becomes candidate (term 4),
                          collects a majority, and sends heartbeats
                          BEFORE S2's 270ms is up -> S2 hears S4's heartbeat,
                          resets its timer, stays a follower. No second split.
                          ---------------------------------------------> time

This is a probabilistic guarantee, not a hard one: a tie is possible in any single round, but its probability shrinks rapidly because two independent random picks rarely match closely, and each retry re-rolls the dice. In practice elections resolve in one or two rounds.

Analogy: Two people reaching for the last slice of pizza at the exact same instant freeze, both say "you take it — no, you," and stall (the split). The social fix: each silently counts to a random number before grabbing again. Whoever finishes counting first just takes it, cleanly, while the other is still counting — and seeing it gone, gives up. Randomized waiting breaks the symmetry that caused the deadlock.

Common mistake: Thinking randomized timeouts prevent split votes. They don't prevent them — they make repeated splits exponentially unlikely and make each one resolve fast. Raft tolerates the occasional split (the term just ends leaderless and a new one starts); it only needs splits to be rare and short, not impossible.

3.7 The "up-to-date log" voting restriction (preview)

So far a follower grants its vote to the first candidate that asks in a term. But there is one more condition, and it's the bridge to Raft's safety guarantees. A server will grant its vote only if the candidate's log is at least as up-to-date as its own log.

The problem it solves: the leader is the single source of truth for the log, and Raft never lets a leader overwrite committed history. So we must never elect a leader that is missing entries other servers have already committed — otherwise the new leader's incomplete log would become the official one, and agreed-upon commands would vanish. The vote is the gate that keeps a behind server from ever becoming leader.

How "up-to-date" is compared (this is exactly why lastLogTerm and lastLogIndex are in RequestVote):

Compare the term of the last entry first. The log whose last entry has the higher lastLogTerm is more up-to-date.
If the last terms are equal, then the longer log (higher lastLogIndex) is more up-to-date.

A voter grants its vote only when the candidate's (lastLogTerm, lastLogIndex) is greater than or equal to the voter's own. If the candidate is behind on either measure, the voter says voteGranted = false.

Example: Voter S3's last log entry is index 5, term 4. Two candidates ask for its vote:

Candidate A: last entry index 6, term 4. Same last term (4), longer log (6 ≥ 5) → at least as up-to-date → S3 grants the vote.
Candidate B: last entry index 8, term 3. Longer log (8), but its last term (3) is lower than S3's (4). Term is compared first, so B is behind despite being longer → S3 refuses. B has a longer-but-older log and must not lead.

This shows why term beats length: a long log full of entries from an old term can still be missing newer, already-committed entries.

We only preview this here. Why this exact comparison is sufficient to guarantee that a newly elected leader holds every committed entry — the Leader Completeness property — is the subject of the safety section. For now, hold onto this: elections aren't just popularity contests; a candidate must also prove its log is current before anyone will vote for it.

Key takeaway: Leader election and log safety are linked through one rule — the up-to-date-log check in RequestVote. Majority voting decides who leads; the log comparison decides who is even eligible to lead. Together they ensure the winner never destroys committed data.

Common mistakes & misconceptions

Common mistake: Believing a candidate needs a majority of the servers that respond. It needs votes from a majority of the entire cluster (size is fixed, e.g. 3 of 5). Unreachable servers count as non-votes, not as a smaller denominator. This is what lets a partitioned minority correctly fail to elect a leader.

Common mistake: Thinking a leader announces its own death so the cluster can react. It does not. Detection is by absence of heartbeats. A crashed leader, a frozen leader, and a network-isolated leader all look identical to followers — which is exactly why the same timeout mechanism handles all of them.

Common mistake: Assuming every term has a leader. A term can end with no leader at all if its election split. Raft simply moves to the next term and tries again; a leaderless term is normal and harmless.

Common mistake: Treating randomized timeouts as a guarantee against ties. They are a probabilistic tool. Two candidates can still split a vote in one round; randomization just makes that rare and self-correcting, not impossible.

Common mistake: Thinking the most "popular" or first-asking candidate always wins. A candidate with a stale or behind log will be refused by up-to-date voters because of the lastLogTerm/lastLogIndex check, even if it asked first. Eligibility comes before popularity.

Best practices

Best practice: Set the election timeout range well above the typical round-trip time and heartbeat interval — the rule of thumb is broadcastTime ≪ electionTimeout ≪ MTBF (mean time between failures). The paper's 150–300 ms suits a fast LAN; use larger values for cross-datacenter links so transient slowness doesn't trigger needless elections.

Best practice: Send heartbeats far more frequently than the election timeout — commonly an order of magnitude faster (e.g. heartbeat every ~50 ms against a 150–300 ms timeout) — so one dropped heartbeat never costs you a leader.

Best practice: Keep the timeout interval genuinely randomized per server and per attempt. Re-roll the random value at the start of every election. A fixed or poorly seeded timeout reintroduces the synchronized split votes randomization was meant to eliminate.

Best practice: Persist currentTerm and votedFor to stable storage before responding to any RPC. If a server crashes and restarts within a term, it must remember it already voted, or it could vote a second time and break the at-most-one-leader guarantee.

Best practice: Prefer an odd cluster size (3, 5, 7). Odd sizes give a clean majority and maximize fault tolerance per node — 5 servers tolerate 2 failures, while going to 6 still only tolerates 2 but makes every quorum larger and slower.

Section summary

Every Raft server is a Follower, Candidate, or Leader; normal operation has exactly one leader and the rest followers.
Terms are an ever-increasing logical clock; each term starts with an election and has at most one leader. Every message carries a term, and a higher term forces stale leaders/candidates back to follower.
The leader keeps power by broadcasting heartbeats (empty AppendEntries); receiving one resets a follower's election timer.
A follower whose election timeout elapses becomes a candidate: it increments its term, votes for itself, and sends RequestVote(term, candidateId, lastLogIndex, lastLogTerm) to all servers.
A candidate wins by collecting votes from a majority of the whole cluster; each server votes once per term, first-come, which (via quorum overlap) guarantees at most one winner.
Split votes happen when servers time out together; randomized timeouts (e.g. 150–300 ms) de-synchronize them so one candidate usually wins quickly and repeated splits are rare.
An election can end with no leader; Raft just starts a new term and retries.
The up-to-date-log restriction — comparing lastLogTerm then lastLogIndex — means a candidate whose log is behind cannot win, linking election to safety and setting up the Leader Completeness guarantee.

Continue reading

🌐 Distributed Systems

How to Use This Guide

🌐 Distributed Systems

The Consensus Problem

🌐 Distributed Systems

Replicated State Machines & the Log