The CAP Theorem (and PACELC)
Once you split data across more than one machine, you run into a hard, unavoidable rule about what those machines can promise you. That rule is the CAP theorem. It is the single most quoted idea in distributed systems, and also the most misquoted. This section teaches it from the ground up, corrects the popular myths, and then shows you the more useful modern version called PACELC.
Let us first say plainly what problem we are dealing with. A distributed data store is a database whose data lives on several computers (called nodes or replicas) instead of one. We do this for two reasons: so the system survives one machine dying, and so it can serve more traffic. But the moment data is copied to several machines, those copies can disagree, and the network connecting them can fail. CAP is about what happens then.
5.1 A tiny bit of history (so the names make sense)
The idea was first stated by Eric Brewer, a Berkeley professor and later a Google VP. He floated it informally in the autumn of 1998 and presented it as a conjecture (an unproven guess) in his keynote at the year-2000 PODC conference (the Symposium on Principles of Distributed Computing). In 2002, two MIT researchers, Seth Gilbert and Nancy Lynch, wrote a formal mathematical proof. Their proof turned Brewer's guess into a real theorem (a statement that has been proven true). Then in 2012, Brewer wrote a famous follow-up article, "CAP Twelve Years Later: How the 'Rules' Have Changed", walking back the simplistic way people had started teaching it. Most of the corrections in this section come straight from that 2012 article.
5.2 The three letters, defined in plain words
CAP stands for three properties a distributed data store might try to guarantee. Each has a precise technical meaning that is narrower than the everyday word suggests, so read these carefully.
C — Consistency
What it is: every read sees the most recent write. If I save balance = 100 and then you read the balance one nanosecond later from any replica, you must see 100, never the old value.
The exact technical name for this is linearizability (also called atomic or single-copy consistency). Linearizability means the whole cluster behaves as if there were only one copy of the data and every operation happened instantly at one single moment in time, in a clear order. Brewer's 2012 phrasing: consistency here is "equivalent to having a single up-to-date copy of the data."
A — Availability
What it is: every request sent to a working (non-failed) node gets a non-error response — it does not hang forever and it does not reply "sorry, I can't serve you right now." Crucially, an available system is allowed to return data that is slightly stale (out of date). Availability says "you get an answer," not "you get the freshest answer."
P — Partition tolerance
What it is: the system keeps working even when the network between nodes drops or delays messages. A network partition is when the connection between groups of nodes breaks, so node A can no longer talk to node B, even though both machines are alive and healthy. Think of a cut undersea cable, a misconfigured firewall, or a switch that starts silently dropping packets. The nodes are fine; the wire between them is the problem.
5.3 The actual claim: during a partition, choose C or A
Here is the theorem in one sentence: when a network partition happens, a distributed system must choose between Consistency and Availability — it cannot have both.
Why is that forced? Walk through it. The network has split your two replicas into two groups that cannot talk. Now a write request arrives at one side. That side faces a fork in the road with only two doors:
- Door 1 — stay Consistent (sacrifice Availability): Refuse the request, because this side cannot safely confirm the change with the other side. It returns an error or just waits. Data stays correct everywhere, but the user did not get served — availability is lost.
- Door 2 — stay Available (sacrifice Consistency): Accept the write on this side anyway and reply "OK." The user is served, but now the two sides hold different values for the same data. They have diverged — consistency is lost.
There is no third door. You cannot both accept the write and keep both replicas identical when the replicas physically cannot communicate. That impossibility is the whole theorem.
NORMAL: nodes can talk, both C and A hold
┌─────────┐ network OK ┌─────────┐
│ Replica │ <═════════════> │ Replica │
│ A │ replicate │ B │
└─────────┘ └─────────┘
PARTITION: the wire is cut ✂
┌─────────┐ ┌─────────┐
│ Replica │ ✂ X X X ✂ │ Replica │
│ A │ no messages │ B │
└────┬────┘ get through └────┬────┘
│ │
write "x=2" arrives read "x" arrives
│ │
┌─────────┴─────────┐ ┌─────────┴─────────┐
│ Choose CONSISTENCY │ │ Choose AVAILABILITY│
│ → refuse / error │ │ → answer with OLD │
│ → user not served │ │ value x=1 (stale)│
│ (A sacrificed) │ │ (C sacrificed) │
└────────────────────┘ └────────────────────┘
5.4 Why "P" is not really optional
Beginners read "pick two of three" and think: "Fine, I'll pick C and A and drop P — I'll build a CA system." In the real world, this is not a real option. Here is why.
Partitions are caused by the network, and you do not control the network. Cables get cut, routers reboot, packets get dropped, a cloud availability zone loses connectivity. These things will happen to any system whose nodes are on separate machines. You cannot choose to make partitions not occur, the same way you cannot choose for it to never rain.
So partition tolerance is not a feature you opt into — it is a fact of life you must survive. That means P is effectively mandatory, and the real, practical choice is only ever C versus A, and only during a partition. A truly "CA" system would be one that simply gives up — stops working entirely — the instant the network hiccups. That is not a useful distributed system; it is a single-machine database (where there is no network between replicas, so no partition can happen).
5.5 CP systems vs AP systems, with real examples
We label systems by what they choose when a partition strikes.
| Type | During a partition it… | Good for | Real systems |
|---|---|---|---|
| CP (Consistency + Partition tolerance) | Keeps data correct; refuses or blocks requests on the side that can't confirm. Sacrifices availability. | Coordination, locks, config, balances — where wrong data is worse than no data. | ZooKeeper, etcd, HBase, Google Spanner, traditional RDBMS in strict mode |
| AP (Availability + Partition tolerance) | Keeps answering on every reachable node; allows replicas to diverge, reconciles later. Sacrifices consistency. | Shopping carts, social feeds, sensor data, sessions — where an answer now beats a perfect answer. | Cassandra, Amazon DynamoDB (in eventually-consistent mode), Riak, CouchDB |
Why ZooKeeper / etcd are CP
etcd (the key-value store that holds Kubernetes' cluster state) and ZooKeeper (a coordination service) exist to be the single source of truth that everyone agrees on. They use a consensus algorithm (Raft for etcd, Zab for ZooKeeper) that requires a majority (quorum) of nodes to agree before any write is accepted. If a partition leaves a group of nodes without a majority, that minority side stops accepting writes — it would rather be unavailable than risk telling two clients different things. Imagine if etcd let two halves of a split cluster each elect a different leader: Kubernetes would run two conflicting versions of your cluster. Unthinkable — so it picks C.
Why Cassandra / DynamoDB are AP
Cassandra and DynamoDB grew out of Amazon's famous "Dynamo" design, built for an online store that must never reject a customer. Their guiding rule: an "add to cart" click must always succeed. So during a partition, every reachable node keeps accepting reads and writes. The copies drift apart, and the system reconciles them afterward using techniques like last-write-wins or vector clocks (a way to track which version came after which). They offer "tunable consistency" — you can ask for stronger guarantees per-request — but their default temperament is to stay available.
5.6 The big misconceptions (Brewer's own corrections)
Brewer's 2012 article exists mainly to fix how badly "two of three" had been taught. Here are the corrections.
- It only applies during a partition. When the network is healthy (which is almost all the time), a well-built system can deliver both consistency and availability at once. You are not perpetually paying a tax. As Brewer put it, "CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare."
- "2 of 3" is an oversimplification. You don't permanently pick two letters and throw one away. P is forced; the C-vs-A choice only fires during partitions; and you can even make the choice differently for different operations in the same system (e.g., serve product browsing as AP but process payments as CP).
- The properties are not binary. Brewer stresses they are "more continuous than binary." Consistency has a whole spectrum (linearizable → causal → eventual). Availability is a percentage, not a yes/no. You tune where you sit, you don't flip a switch.
- Partitions can be managed, not just suffered. Brewer suggests a three-step recipe: (1) detect the partition began, (2) enter partition mode and limit certain risky operations, (3) when the network heals, recover — merge the diverged data and compensate for any mistakes made during the split.
5.7 PACELC — the part CAP forgot
CAP has a blind spot: it only describes what happens during a partition. But partitions are rare. What governs your system the other 99.9% of the time, when the network is perfectly healthy? CAP is silent. In 2010, Yale professor Daniel Abadi proposed an extension to fill that gap (formalized in his 2012 paper, "Consistency Tradeoffs in Modern Distributed Database System Design"). He called it PACELC.
Read the name as a sentence: Partition? then choose A vs C; Else (no partition) choose L vs C.
┌──────────────────────────┐
│ Is the network │
│ PARTITIONED right now? │
└───────────┬──────────────┘
YES (P) │ NO (E = "Else")
┌───────────────┴───────────────┐
▼ ▼
┌──────────────┐ ┌──────────────┐
│ choose A or C │ │ choose L or C │
│ (CAP's case) │ │ Latency vs │
│ Availability │ │ Consistency │
│ vs │ │ (the part CAP │
│ Consistency │ │ ignored) │
└──────────────┘ └──────────────┘
The new insight is the right-hand branch. Even with a perfect network, keeping replicas in sync costs time. To guarantee a read sees the latest write, the system must contact other replicas (or wait for a majority to acknowledge) before answering — that round-trip adds latency (delay). If instead you let a nearby replica answer immediately from possibly-stale data, you get speed but give up strict consistency. So even in calm weather there is a trade: Latency vs Consistency. This is the trade-off CAP completely ignores, and it's the one you actually pay every single day.
Every system therefore carries a two-part label, like PC/EL: the first part is its partition-time choice, the second its normal-time choice.
| System | PACELC | Reads as… |
|---|---|---|
| Cassandra | PA/EL | On partition, favor Availability; else, favor Latency. (Speed-first, eventually consistent.) |
| Riak | PA/EL | Same temperament as Cassandra. |
| DynamoDB | PA/EC | On partition stay Available; else favor Consistency over latency. |
| MongoDB | PA/EC | Stays available on partition; in normal times leans consistent. |
| HBase, BigTable | PC/EC | Consistency always — both during partitions and normal times. Never sacrifices correctness. |
| VoltDB / H-Store | PC/EC | Strongly consistent in all conditions. |
| Google Spanner | PC/EC | Consistency-first everywhere; accepts extra latency (its TrueTime clocks) to stay linearizable. |
Common mistakes & misconceptions
Best practices
Section summary
- CAP: Brewer's conjecture (1998/2000), proven by Gilbert & Lynch (2002), clarified by Brewer (2012).
- C = every read sees the latest write (linearizability); A = every request to a live node gets a non-error answer; P = the system keeps working despite dropped/delayed network messages.
- During a network partition you must choose Consistency or Availability — there is no third option.
- Partition tolerance isn't optional on real networks, so the practical choice is only C-vs-A, and only during a partition.
- CP systems (ZooKeeper, etcd, HBase, Spanner) refuse/block to stay correct; AP systems (Cassandra, DynamoDB, Riak) keep answering and reconcile later.
- CAP's C is linearizability, not ACID's C; the two are unrelated despite the shared letter.
- "2 of 3" is misleading: CAP only bites during rare partitions, the properties are continuous not binary, and the choice can be made per-operation.
- PACELC extends CAP: if Partitioned, trade A vs C; Else, trade Latency vs Consistency — the cost CAP ignores.
- PACELC labels: Cassandra/Riak = PA/EL, DynamoDB/MongoDB = PA/EC, HBase/BigTable/Spanner/VoltDB = PC/EC.
- The everyday Latency-vs-Consistency trade (PACELC's "Else") usually affects users more than the rare partition does.