The CAP Theorem (and PACELC)

By Pritesh Yadav 16 min read

Once you split data across more than one machine, you run into a hard, unavoidable rule about what those machines can promise you. That rule is the CAP theorem. It is the single most quoted idea in distributed systems, and also the most misquoted. This section teaches it from the ground up, corrects the popular myths, and then shows you the more useful modern version called PACELC.

Let us first say plainly what problem we are dealing with. A distributed data store is a database whose data lives on several computers (called nodes or replicas) instead of one. We do this for two reasons: so the system survives one machine dying, and so it can serve more traffic. But the moment data is copied to several machines, those copies can disagree, and the network connecting them can fail. CAP is about what happens then.

5.1 A tiny bit of history (so the names make sense)

The idea was first stated by Eric Brewer, a Berkeley professor and later a Google VP. He floated it informally in the autumn of 1998 and presented it as a conjecture (an unproven guess) in his keynote at the year-2000 PODC conference (the Symposium on Principles of Distributed Computing). In 2002, two MIT researchers, Seth Gilbert and Nancy Lynch, wrote a formal mathematical proof. Their proof turned Brewer's guess into a real theorem (a statement that has been proven true). Then in 2012, Brewer wrote a famous follow-up article, "CAP Twelve Years Later: How the 'Rules' Have Changed", walking back the simplistic way people had started teaching it. Most of the corrections in this section come straight from that 2012 article.

Key takeaway: CAP = Brewer's conjecture (1998/2000), proven by Gilbert & Lynch (2002), clarified by Brewer himself (2012). When someone recites "two out of three," they are quoting the 1998 slogan — not the careful 2012 version.

5.2 The three letters, defined in plain words

CAP stands for three properties a distributed data store might try to guarantee. Each has a precise technical meaning that is narrower than the everyday word suggests, so read these carefully.

C — Consistency

What it is: every read sees the most recent write. If I save balance = 100 and then you read the balance one nanosecond later from any replica, you must see 100, never the old value.

The exact technical name for this is linearizability (also called atomic or single-copy consistency). Linearizability means the whole cluster behaves as if there were only one copy of the data and every operation happened instantly at one single moment in time, in a clear order. Brewer's 2012 phrasing: consistency here is "equivalent to having a single up-to-date copy of the data."

Common mistake: The "C" in CAP is not the "C" in ACID. The ACID C means "a transaction leaves the database obeying its rules/constraints" (a business-rule guarantee inside one database). The CAP C means linearizability — "all replicas agree on the latest value." Two completely different ideas that happen to share a letter. Brewer notes the CAP C is "single-copy consistency, a strict subset of ACID consistency."

A — Availability

What it is: every request sent to a working (non-failed) node gets a non-error response — it does not hang forever and it does not reply "sorry, I can't serve you right now." Crucially, an available system is allowed to return data that is slightly stale (out of date). Availability says "you get an answer," not "you get the freshest answer."

P — Partition tolerance

What it is: the system keeps working even when the network between nodes drops or delays messages. A network partition is when the connection between groups of nodes breaks, so node A can no longer talk to node B, even though both machines are alive and healthy. Think of a cut undersea cable, a misconfigured firewall, or a switch that starts silently dropping packets. The nodes are fine; the wire between them is the problem.

Analogy: Picture two bank branches of the same bank, in two cities, that must always show the same account balance. Consistency = both branches always show the identical, latest balance. Availability = you can always walk into either branch and get served. Partition = the phone line between the two branches goes dead, so they can no longer tell each other about new deposits.

5.3 The actual claim: during a partition, choose C or A

Here is the theorem in one sentence: when a network partition happens, a distributed system must choose between Consistency and Availability — it cannot have both.

Why is that forced? Walk through it. The network has split your two replicas into two groups that cannot talk. Now a write request arrives at one side. That side faces a fork in the road with only two doors:

  1. Door 1 — stay Consistent (sacrifice Availability): Refuse the request, because this side cannot safely confirm the change with the other side. It returns an error or just waits. Data stays correct everywhere, but the user did not get served — availability is lost.
  2. Door 2 — stay Available (sacrifice Consistency): Accept the write on this side anyway and reply "OK." The user is served, but now the two sides hold different values for the same data. They have diverged — consistency is lost.

There is no third door. You cannot both accept the write and keep both replicas identical when the replicas physically cannot communicate. That impossibility is the whole theorem.

        NORMAL: nodes can talk, both C and A hold
        ┌─────────┐    network OK    ┌─────────┐
        │ Replica │ <═════════════>  │ Replica │
        │    A    │   replicate      │    B    │
        └─────────┘                  └─────────┘

        PARTITION: the wire is cut  ✂
        ┌─────────┐                  ┌─────────┐
        │ Replica │  ✂  X  X  X  ✂   │ Replica │
        │    A    │   no messages    │    B    │
        └────┬────┘   get through    └────┬────┘
             │                            │
   write "x=2" arrives                read "x" arrives
             │                            │
   ┌─────────┴─────────┐        ┌─────────┴─────────┐
   │ Choose CONSISTENCY │       │ Choose AVAILABILITY│
   │  → refuse / error  │       │  → answer with OLD │
   │  → user not served │       │    value x=1 (stale)│
   │  (A sacrificed)    │       │  (C sacrificed)    │
   └────────────────────┘       └────────────────────┘
Example: A shopping site stores your cart on two replicas. The cable between them is cut. You add "blue shoes." Replica A takes the write. A CP system would refuse the write (or block) so the cart never disagrees — you might see "try again later." An AP system accepts "blue shoes" on A, lets B stay unaware for now, replies "Added!", and reconciles the two carts after the network heals. Same partition, two opposite choices.

5.4 Why "P" is not really optional

Beginners read "pick two of three" and think: "Fine, I'll pick C and A and drop P — I'll build a CA system." In the real world, this is not a real option. Here is why.

Partitions are caused by the network, and you do not control the network. Cables get cut, routers reboot, packets get dropped, a cloud availability zone loses connectivity. These things will happen to any system whose nodes are on separate machines. You cannot choose to make partitions not occur, the same way you cannot choose for it to never rain.

So partition tolerance is not a feature you opt into — it is a fact of life you must survive. That means P is effectively mandatory, and the real, practical choice is only ever C versus A, and only during a partition. A truly "CA" system would be one that simply gives up — stops working entirely — the instant the network hiccups. That is not a useful distributed system; it is a single-machine database (where there is no network between replicas, so no partition can happen).

Key takeaway: In any real multi-node system, P is forced on you. So CAP boils down to one decision: during a partition, do you favor Consistency (CP) or Availability (AP)? Outside of partitions, this trade-off does not even apply.

5.5 CP systems vs AP systems, with real examples

We label systems by what they choose when a partition strikes.

TypeDuring a partition it…Good forReal systems
CP (Consistency + Partition tolerance)Keeps data correct; refuses or blocks requests on the side that can't confirm. Sacrifices availability.Coordination, locks, config, balances — where wrong data is worse than no data.ZooKeeper, etcd, HBase, Google Spanner, traditional RDBMS in strict mode
AP (Availability + Partition tolerance)Keeps answering on every reachable node; allows replicas to diverge, reconciles later. Sacrifices consistency.Shopping carts, social feeds, sensor data, sessions — where an answer now beats a perfect answer.Cassandra, Amazon DynamoDB (in eventually-consistent mode), Riak, CouchDB

Why ZooKeeper / etcd are CP

etcd (the key-value store that holds Kubernetes' cluster state) and ZooKeeper (a coordination service) exist to be the single source of truth that everyone agrees on. They use a consensus algorithm (Raft for etcd, Zab for ZooKeeper) that requires a majority (quorum) of nodes to agree before any write is accepted. If a partition leaves a group of nodes without a majority, that minority side stops accepting writes — it would rather be unavailable than risk telling two clients different things. Imagine if etcd let two halves of a split cluster each elect a different leader: Kubernetes would run two conflicting versions of your cluster. Unthinkable — so it picks C.

Why Cassandra / DynamoDB are AP

Cassandra and DynamoDB grew out of Amazon's famous "Dynamo" design, built for an online store that must never reject a customer. Their guiding rule: an "add to cart" click must always succeed. So during a partition, every reachable node keeps accepting reads and writes. The copies drift apart, and the system reconciles them afterward using techniques like last-write-wins or vector clocks (a way to track which version came after which). They offer "tunable consistency" — you can ask for stronger guarantees per-request — but their default temperament is to stay available.

Analogy: A CP system is a careful pharmacist: if she can't verify your prescription against the central record, she refuses to hand over the pills — better to send you away than risk a dangerous mistake. An AP system is a busy coffee shop during a card-network outage: it keeps serving customers and writes the orders on paper, sorting out the payments once the system comes back. One refuses to be wrong; the other refuses to stop serving.

5.6 The big misconceptions (Brewer's own corrections)

Brewer's 2012 article exists mainly to fix how badly "two of three" had been taught. Here are the corrections.

  • It only applies during a partition. When the network is healthy (which is almost all the time), a well-built system can deliver both consistency and availability at once. You are not perpetually paying a tax. As Brewer put it, "CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare."
  • "2 of 3" is an oversimplification. You don't permanently pick two letters and throw one away. P is forced; the C-vs-A choice only fires during partitions; and you can even make the choice differently for different operations in the same system (e.g., serve product browsing as AP but process payments as CP).
  • The properties are not binary. Brewer stresses they are "more continuous than binary." Consistency has a whole spectrum (linearizable → causal → eventual). Availability is a percentage, not a yes/no. You tune where you sit, you don't flip a switch.
  • Partitions can be managed, not just suffered. Brewer suggests a three-step recipe: (1) detect the partition began, (2) enter partition mode and limit certain risky operations, (3) when the network heals, recover — merge the diverged data and compensate for any mistakes made during the split.
Common mistake: Treating CAP as a permanent, system-wide label printed on the box. The real choice is per-operation and only matters during the rare seconds of a partition. Designing your whole architecture as if you're always sacrificing something is a misreading of the theorem.

5.7 PACELC — the part CAP forgot

CAP has a blind spot: it only describes what happens during a partition. But partitions are rare. What governs your system the other 99.9% of the time, when the network is perfectly healthy? CAP is silent. In 2010, Yale professor Daniel Abadi proposed an extension to fill that gap (formalized in his 2012 paper, "Consistency Tradeoffs in Modern Distributed Database System Design"). He called it PACELC.

Read the name as a sentence: Partition? then choose A vs C; Else (no partition) choose L vs C.

              ┌──────────────────────────┐
              │  Is the network          │
              │  PARTITIONED right now?   │
              └───────────┬──────────────┘
                  YES (P) │ NO  (E = "Else")
          ┌───────────────┴───────────────┐
          ▼                                ▼
   ┌──────────────┐                ┌──────────────┐
   │ choose A or C │               │ choose L or C │
   │  (CAP's case) │               │ Latency  vs   │
   │ Availability  │               │ Consistency   │
   │  vs           │               │ (the part CAP │
   │ Consistency   │               │  ignored)     │
   └──────────────┘                └──────────────┘

The new insight is the right-hand branch. Even with a perfect network, keeping replicas in sync costs time. To guarantee a read sees the latest write, the system must contact other replicas (or wait for a majority to acknowledge) before answering — that round-trip adds latency (delay). If instead you let a nearby replica answer immediately from possibly-stale data, you get speed but give up strict consistency. So even in calm weather there is a trade: Latency vs Consistency. This is the trade-off CAP completely ignores, and it's the one you actually pay every single day.

Every system therefore carries a two-part label, like PC/EL: the first part is its partition-time choice, the second its normal-time choice.

SystemPACELCReads as…
CassandraPA/ELOn partition, favor Availability; else, favor Latency. (Speed-first, eventually consistent.)
RiakPA/ELSame temperament as Cassandra.
DynamoDBPA/ECOn partition stay Available; else favor Consistency over latency.
MongoDBPA/ECStays available on partition; in normal times leans consistent.
HBase, BigTablePC/ECConsistency always — both during partitions and normal times. Never sacrifices correctness.
VoltDB / H-StorePC/ECStrongly consistent in all conditions.
Google SpannerPC/ECConsistency-first everywhere; accepts extra latency (its TrueTime clocks) to stay linearizable.
Example — Google Spanner: Spanner is globally distributed yet linearizable, which sounds like it "beats" CAP. It doesn't. Spanner is a PC/EC system: when partitioned it chooses Consistency over Availability (CP), and in normal operation it chooses Consistency over Latency (EC) — it deliberately waits out a small clock-uncertainty window (its "TrueTime" mechanism) on commits to keep ordering correct. Engineers call it "effectively CA" only because Google's private network makes partitions so rare and so short that availability looks near-perfect in practice. It still obeys the theorem; it just hides the cost behind excellent infrastructure.
Analogy: You text a group of friends to confirm dinner plans. Consistency = you wait until everyone has replied "yes" before you book — slow, but no one shows up to the wrong place. Low latency = you book the instant the first person replies — fast, but someone might have wanted a different restaurant. There's no partition here (everyone's phone works fine); the trade-off between waiting-for-agreement and answering-fast exists anyway. That everyday trade is exactly PACELC's "Else" branch.
Key takeaway: CAP describes the rare emergency (a partition). PACELC adds the everyday reality: even with a healthy network, syncing replicas forces a Latency-vs-Consistency trade. PACELC is the more complete and more practically useful model.

Common mistakes & misconceptions

Common mistake: Thinking you "pick 2 of 3" permanently. P is forced by reality, and the C-vs-A choice only matters during a partition. The rest of the time you can have both C and A.
Common mistake: Confusing CAP's Consistency with ACID's Consistency. CAP-C means linearizability (all replicas agree on the latest value). ACID-C means a transaction preserves database rules/constraints. Same letter, unrelated guarantees.
Common mistake: Building a "CA" system. On real multi-machine networks, partitions are inevitable, so a system that can't tolerate them simply breaks when the network blinks. Genuine "CA" only exists on a single node, where there's no network to partition.
Common mistake: Believing Google Spanner "broke CAP." It didn't. It's a CP / PC-EC system whose rare, short partitions and superb network make it look always-available. It pays the consistency cost in latency, not in violating the theorem.
Common mistake: Treating "AP = no consistency ever." AP systems are usually eventually consistent: replicas reconcile after the partition heals. They drop strong (immediate) consistency, not all consistency.
Common mistake: Ignoring the everyday cost. Teams obsess over the partition case and forget PACELC's "Else": the Latency-vs-Consistency trade you pay on every normal request often matters far more for user experience than the rare partition does.

Best practices

Best practice: Decide C-vs-A per data type, not per company. Money, inventory counts, locks, and config want CP. Carts, feeds, view counts, and sessions are happy with AP. One application can (and should) use both.
Best practice: Use PACELC, not just CAP, when choosing a database. Ask both questions: "What does it do on a partition?" and "What latency does it cost me for consistency on a normal day?" The second question hits you constantly.
Best practice: Plan the partition-recovery path up front (Brewer's detect → limit → recover). Know how diverged replicas will be merged (last-write-wins, vector clocks, CRDTs) before the partition, not during the 3 a.m. incident.
Best practice: For coordination/source-of-truth needs (service discovery, leader election, config), reach for proven CP systems like etcd or ZooKeeper rather than rolling your own — consensus is famously easy to get subtly wrong.
Best practice: Treat consistency as a dial, not a switch. Many AP databases offer tunable, per-request consistency (e.g., Cassandra's quorum levels). Turn it up for the few operations that need it and leave it low for the rest.

Section summary

  • CAP: Brewer's conjecture (1998/2000), proven by Gilbert & Lynch (2002), clarified by Brewer (2012).
  • C = every read sees the latest write (linearizability); A = every request to a live node gets a non-error answer; P = the system keeps working despite dropped/delayed network messages.
  • During a network partition you must choose Consistency or Availability — there is no third option.
  • Partition tolerance isn't optional on real networks, so the practical choice is only C-vs-A, and only during a partition.
  • CP systems (ZooKeeper, etcd, HBase, Spanner) refuse/block to stay correct; AP systems (Cassandra, DynamoDB, Riak) keep answering and reconcile later.
  • CAP's C is linearizability, not ACID's C; the two are unrelated despite the shared letter.
  • "2 of 3" is misleading: CAP only bites during rare partitions, the properties are continuous not binary, and the choice can be made per-operation.
  • PACELC extends CAP: if Partitioned, trade A vs C; Else, trade Latency vs Consistency — the cost CAP ignores.
  • PACELC labels: Cassandra/Riak = PA/EL, DynamoDB/MongoDB = PA/EC, HBase/BigTable/Spanner/VoltDB = PC/EC.
  • The everyday Latency-vs-Consistency trade (PACELC's "Else") usually affects users more than the rare partition does.

Continue reading