Vector Clocks & Causality
In the previous section you met Lamport clocks — simple counters that give every event a number so we can put events in some order. Lamport clocks are useful, but they have a famous blind spot. This section fixes that blind spot with a tool called the vector clock. By the end you will be able to look at two events in a distributed system and say, with certainty, one of exactly three things: "this one definitely came first", "that one definitely came first", or "these two happened at the same time and nobody can break the tie". That third answer — concurrency — is the whole point, and it is the foundation of how real databases like Amazon Dynamo and Riak detect conflicting writes.
Let me define a few words up front, in plain English, before we use them:
- Node (also called a process or replica): one independent computer in the system. It runs on its own and talks to others only by sending messages.
- Event: one thing that happens on a node — doing some local work, sending a message, or receiving a message.
- Causality: a fancy word for cause-and-effect. Event
a"causes" eventbif information fromacould have travelled toband influenced it. If no information could have travelled between them, neither caused the other. - Happened-before (written
a → b): the formal name for "acould have influencedb". This is Lamport's relation from the last section. Two events with no happened-before link in either direction are called concurrent — they are independent.
4.1 Why Lamport clocks are not enough
A Lamport clock gives every event a single number, written C(e). Its one guarantee is this:
If a → b (a happened-before b), then C(a) < C(b).
Read carefully: this guarantee only runs in one direction. It says causality forces the numbers to go up. It does not say the reverse. So if you are handed two numbers and you see C(a) < C(b), you cannot conclude that a caused b. The smaller number might just be a coincidence — two totally independent events on two different nodes can easily end up with C(a) = 3 and C(b) = 7 even though neither knew the other existed.
In logic terms: a → b implies C(a) < C(b), but C(a) < C(b) does not imply a → b. The implication is one-way. We say the Lamport clock condition is necessary but not sufficient for causality.
Why does this matter? Because the most valuable question in a distributed database is exactly the one Lamport clocks cannot answer: "Did these two writes happen one-after-another (so I can safely keep the later one), or did they happen at the same time on different replicas that hadn't yet talked (so I have a genuine conflict)?" To answer that, we need a clock that captures causality in both directions. That clock is the vector clock.
C(a) < C(b) never proves causation. Vector clocks were invented to close exactly this gap.
4.2 What a vector clock is
A vector clock is, instead of a single number per event, a whole list of numbers — one slot for every node in the system. If there are three nodes — call them A, B, and C — then every vector clock has three slots, like [A:2, B:5, C:1] or just [2, 5, 1] if we agree on a fixed order.
Each node owns exactly one slot — its own. The number in node A's own slot counts "how many events have happened on A so far". The numbers in the other slots are A's best knowledge of "how many events have happened on B and on C, as far as A has heard". So a vector clock is really a little summary: "Here is everything I have personally done, plus everything I have heard about that everyone else has done."
Vector clocks were invented independently in 1988 by two researchers: Colin Fidge ("Timestamps in Message-Passing Systems That Preserve the Partial Ordering", 11th Australian Computer Science Conference) and Friedemann Mattern ("Virtual Time and Global States of Distributed Systems", a 1988 workshop in Château de Bonas, France). Their key result is called the strong clock condition: with vector clocks, a → b is true if and only if VC(a) < VC(b). Notice "if and only if" — the implication now runs both ways, which is precisely what Lamport clocks could not do.
4.3 The three update rules
A vector clock changes by exactly three simple rules. Say node i holds a vector V, and V[i] means "node i's own slot".
- On a local event (the node does some internal work): increment your own slot by one.
V[i] = V[i] + 1. Nothing else changes. - On sending a message: first increment your own slot (
V[i] = V[i] + 1), then attach a copy of your entire vector to the outgoing message. The receiver will need it. - On receiving a message that carries a vector
M: first increment your own slot (V[i] = V[i] + 1), then for every slotk, setV[k] = max(V[k], M[k])— take the bigger of your number and the sender's number, slot by slot. This is called the element-wise maximum.
The element-wise max in rule 3 is the magic. It means "merge in everything the sender knew that I didn't". If the sender had heard about 5 events on node C and I had only heard about 2, after the merge I know about 5 — I've absorbed their knowledge.
4.4 A full worked example with three nodes
Let's run three nodes — A, B, C — all starting at [0, 0, 0]. We'll track the vector after each event. Slots are in the order [A, B, C].
Time ──────────────────────────────────────────────────────────────▶
A: [0,0,0] a1 a2 ─────────msg m1──────────┐
[1,0,0] [2,0,0] │
▼
B: [0,0,0] b1 r1(recv m1) b2
[0,1,0] [2,2,0] [2,3,0]
│
msg m2 ───┘
│
C: [0,0,0] c1 ▼
[0,0,1] r2(recv m2)
[2,3,2]
Legend: a1,a2,b1,b2,c1 = local events; m1,m2 = messages; r1,r2 = receives
Step by step:
- a1 on A: local event → A's slot goes from 0 to 1. A =
[1,0,0]. - c1 on C: local event → C =
[0,0,1]. (A and C have never communicated, so they know nothing about each other.) - b1 on B: local event → B =
[0,1,0]. - a2 on A, then A sends message
m1: a2 is the send event, so A increments its own slot to 2: A =[2,0,0]. The messagem1carries the stamp[2,0,0]. - r1 on B receives
m1: B first increments its own slot (1 → 2), giving[0,2,0], then takes the element-wise max with the message's[2,0,0]: max(0,2)=2 for A, max(2,0)=2 for B, max(0,0)=0 for C → B =[2,2,0]. B now knows A has done 2 events. - b2 on B, then B sends
m2: increment B's own slot → B =[2,3,0]. Messagem2carries[2,3,0]. - r2 on C receives
m2: C increments its own slot (1 → 2) →[0,0,2], then element-wise max with[2,3,0]: → C =[2,3,2]. C now knows about A's 2 events and B's 3 events, all from one message chain.
Notice the causality captured in the final vector on C, [2,3,2]: it "contains" A's work and B's work because information flowed A → B → C. That is exactly the happened-before chain, recorded inside the numbers.
4.5 Comparing two vector clocks — the three possible answers
This is the payoff. Given two vector clocks U and V, compare them slot by slot. There are exactly three outcomes:
| Result | Definition (compare every slot) | Meaning |
|---|---|---|
Equal (U = V) | Every slot is equal. | The same point in time / the same event. |
Happened-before (U < V) | Every slot of U is ≤ the matching slot of V, and at least one slot is strictly <. | U causally came first; U → V. V is a descendant of U. |
Concurrent (U ∥ V) | Neither U < V nor V < U holds — each has at least one slot bigger than the other. | Independent. Neither caused the other. Potential conflict. |
The crucial rule to memorise: "less-than-or-equal in every slot, and strictly less in at least one" means happened-before. If you cannot say that about either direction, the events are concurrent.
[1,0,0]vs[2,2,0]: every slot of the first is ≤ the second (1≤2, 0≤2, 0≤0) and at least one is strictly smaller → first happened-before second. The first event causally precedes the second.[2,0,0]vs[0,1,0]: first has A=2 > 0, but second has B=1 > 0. Each beats the other in some slot → concurrent. (These are events a2 and b1 from our diagram — and indeed they happened before A and B ever talked.)[2,3,0]vs[2,3,2]: 2≤2, 3≤3, 0≤2, one strictly smaller → first happened-before second.
[2,0,0] (sum 2) and [0,1,0] (sum 1) have different sums but are actually concurrent. Summing throws away exactly the per-node detail that detects concurrency. You must compare slot-by-slot.
4.6 Detecting concurrent updates — the basis of "siblings"
Now connect this to a real database. Imagine a key-value store that keeps copies of your data on several nodes for availability (so the store stays up even if one node dies). Each stored version of a value carries a vector clock. When a node receives a write, it compares the incoming version's clock to the version it already has:
- If the new clock is greater than (happened-after) the old one → the new write is a straightforward, causal update. Overwrite the old version. No conflict.
- If the new clock is less than the old one → the new write is stale (an old update arriving late). Keep the version you have.
- If the two clocks are concurrent → two clients updated the same key independently, on replicas that hadn't synced. Neither is "newer". The database keeps both versions. These coexisting concurrent versions are called siblings.
The database cannot decide which sibling is "right" — that's a business question (which shopping cart? which user profile?). So it hands all siblings back to the application at the next read and asks it to merge them. This is called semantic reconciliation: the app applies domain rules (for example, "union the two shopping carts so no item is lost") and writes back one reconciled value whose new vector clock descends from both siblings, collapsing the branch.
4.7 Real-world: Amazon Dynamo and the shopping-cart example
This design is exactly what Amazon described in their landmark 2007 Dynamo paper ("Dynamo: Amazon's Highly Available Key-value Store"). Dynamo chose availability over strong consistency — the shopping cart must always accept an "add to cart", even during network trouble. That means two replicas can take writes independently and disagree, so Dynamo uses vector clocks (it calls them version vectors, stored as a list of (node, counter) pairs) to tell apart genuine causal updates from real conflicts.
Here is the actual example from the Dynamo paper, with server nodes Sx, Sy, Sz handling writes to one shopping-cart object:
WRITE handled by Sx D1 [(Sx,1)]
│ │
WRITE handled by Sx D2 [(Sx,2)] (D2 descends from D1)
│
┌────────────┴────────────┐
│ (network split: │
│ two replicas write │
│ without seeing │
│ each other) │
▼ ▼
WRITE on Sy WRITE on Sz
D3 [(Sx,2),(Sy,1)] D4 [(Sx,2),(Sz,1)]
│ │
└────────────┬─────────────┘
▼
client READ sees BOTH D3 and D4
D3 vs D4: D3 has (Sy,1) but no Sz
D4 has (Sz,1) but no Sy
⇒ NEITHER ≤ the other ⇒ CONCURRENT
⇒ they are SIBLINGS, returned together
│
▼
client reconciles (merges carts) and writes back via Sx
D5 [(Sx,3),(Sy,1),(Sz,1)]
D5 descends from BOTH D3 and D4 ⇒ conflict resolved
Walk through the clocks: D3 = [(Sx,2),(Sy,1)] and D4 = [(Sx,2),(Sz,1)]. Compare slot by slot. For the Sy slot, D3 has 1 and D4 has 0 (absent counts as 0) — D3 wins. For the Sz slot, D4 has 1 and D3 has 0 — D4 wins. Each beats the other somewhere, so neither happened-before the other: they are concurrent, i.e. a real conflict. Dynamo returns both as siblings. The client merges and writes D5 = [(Sx,3),(Sy,1),(Sz,1)], which is ≥ both D3 and D4 in every slot — so D5 unambiguously supersedes both, and the branch is healed.
Riak, the open-source database directly inspired by Dynamo, uses the same idea and exposes siblings to your application when an allow_mult setting is on. Riak later improved on plain vector clocks with dotted version vectors (DVV) — a refinement that avoids a subtle problem where naive client-id vectors grow without bound and can produce "false siblings". With DVV the clock size stabilises around the replication factor (typically 3) instead of growing with the number of clients.
4.8 Causality cousins: Git's DAG and CRDTs
You already use a causality-tracking system every day: Git. Git doesn't use vector clocks, but it captures the same happened-before idea with a DAG (Directed Acyclic Graph — a graph of nodes with one-way arrows and no loops). Each commit points back to its parent commit(s). "Commit X is an ancestor of commit Y" is Git's version of "X happened-before Y". When two people branch from the same commit and both commit independently, neither is an ancestor of the other — those commits are concurrent, which is exactly why Git asks you to merge (and sometimes flags a conflict). A merge commit with two parents is Git's "reconciliation": a new node that descends from both branches, just like Dynamo's D5.
A related modern idea is the CRDT (Conflict-free Replicated Data Type) — a special data structure designed so that concurrent updates always merge automatically, with no human reconciliation needed. The intuition: instead of just detecting a conflict (what vector clocks do) and dumping it on the app, a CRDT defines a merge rule baked into the data type itself that is guaranteed to give the same answer no matter what order updates arrive in. For example, a "grow-only set" merges by union — order doesn't matter, so concurrent adds never conflict. CRDTs often still use vector-clock-style version information internally to know what each replica has seen; they just add a mathematically safe automatic merge on top.
4.9 Trade-offs: the clock grows with the cluster
Vector clocks are not free. The vector has one slot per node that has ever written the object. In a small fixed cluster that's fine. But:
- In a large system with many nodes, the vector gets long, and it travels with every message and every stored version — more bytes on the wire and on disk.
- Worse, if you give each client its own slot (rather than each server), the vector can grow without limit as new clients appear. The Dynamo paper specifically warns that during failures or partitions, writes handled by nodes outside the normal set make the vector clock grow.
The fix is pruning: when the vector exceeds a threshold of entries, throw away the oldest (node, counter) pairs. Dynamo attaches a timestamp to each pair recording when that node last touched the object, and drops the least-recently-updated entries once the count crosses a limit (Dynamo used a threshold of 10; Riak's tunable limit sits in roughly the 20–50 range). Pruning trades a tiny, rare risk of an inaccurate causality judgement (you might fail to detect that two versions are actually related, producing an unnecessary sibling) for a bounded clock size. In practice this is a sound trade because the discarded entries are old and almost never relevant.
Common mistakes & misconceptions
C(a) < C(b) from a Lamport clock proves a caused b. It does not — the implication only runs the other way. Only a vector clock gives the "if and only if".Best practices
Section summary
- Lamport clocks order events but cannot detect concurrency:
C(a) < C(b)does not provea → b. - A vector clock is a list of counters, one slot per node; your slot counts your events, other slots are your knowledge of theirs.
- Three update rules: local event → increment own slot; send → increment own, attach vector; receive → increment own, then element-wise max with the message's vector.
- Compare slot-by-slot: equal, happened-before (≤ in all slots and < in at least one), or concurrent (each leads somewhere).
- Vector clocks satisfy the strong clock condition —
a → bif and only ifVC(a) < VC(b)— invented by Fidge and Mattern in 1988. - Concurrent versions are "siblings"; databases keep both and ask the application to reconcile them (semantic reconciliation).
- Amazon Dynamo and Riak use vector/version vectors to detect conflicts; the classic D1→D5 shopping-cart example shows two concurrent writes merging into one descendant.
- Git's commit DAG is a causality cousin: ancestor = happened-before, fork = concurrent, merge commit = reconciliation; CRDTs add automatic, conflict-free merging.
- Trade-off: vector size grows with the number of writing nodes; bound it with pruning (drop oldest entries) or dotted version vectors.
Sources: Vector clock — Wikipedia, Dynamo: Amazon’s Highly Available Key-value Store (Riak docs mirror), Dotted Version Vectors (GSD, U. Minho), Vector Clocks Revisited Part 2: Dotted Version Vectors — Riak, Vector Clocks — Kevin Sookocheff, Logical Clocks — pk.org