Glossary of Terms

By Pritesh Yadav 7 min read
Asynchronous network
A network where messages can take any amount of time to arrive, with no guaranteed upper limit. The internet behaves this way. It is the root cause of most distributed-systems difficulty, because you can never tell a slow message apart from a lost one.
Availability
The promise that the system answers every request it receives (even if the answer might be slightly out of date). A system is "highly available" when it almost never refuses to respond.
BASE
An informal counterpart to ACID, standing for "Basically Available, Soft state, Eventually consistent." Describes systems that favour staying up and converging over time rather than being perfectly consistent at every instant.
CAP theorem
The rule that when the network between your machines breaks (a partition), a distributed system must choose between staying Consistent (everyone sees the same data) and staying Available (everyone still gets an answer). You cannot have both during the break.
Causality
The relationship where one event could have influenced another — event A "happened before" event B in a way that B might depend on A. The opposite is two events being concurrent (neither could have affected the other).
Clock skew
The difference between two machines' clocks at a single moment. Even well-synchronised servers are off by a few milliseconds, which is enough to order events wrongly.
Clock drift
The gradual speeding-up or slowing-down of a single machine's clock over time, so it slowly wanders away from real time until something corrects it.
Concurrent events
Two events where neither happened before the other — they ran independently and neither could have caused the other. Vector clocks can detect this; Lamport clocks cannot.
Consensus
The problem of getting several machines to agree on a single value (for example, "who is the leader?" or "what is the next entry in the log?") even when some machines or messages fail.
Consistency (in CAP)
The promise that every read sees the most recent write — all machines show the same data at the same time. (Note: this is stricter than the "C" in ACID, which is a different idea.)
Consistency model
The specific promise a system makes about how fresh and ordered your data looks. Ranges from linearizability (always perfectly current) down to eventual consistency (correct eventually).
Eventual consistency
The weakest common promise: if writes stop, all copies of the data will eventually become identical. It does not say when, and reads in the meantime may be stale.
Fallacies of distributed computing
A famous list of eight false assumptions beginners make (e.g. "the network is reliable", "latency is zero"). Believing them quietly is how systems break.
Fault
Something going wrong in one component — a disk dies, a message is dropped, a machine pauses. A fault is local and expected; good systems tolerate faults.
Failure
When the system as a whole stops doing its job. A failure is a fault that was not contained. The goal of fault tolerance is to stop faults from becoming failures.
Happens-before
A relationship (written A → B) meaning A could have caused or influenced B: either they happened on the same machine in order, or A was the sending of a message that B received. The foundation of logical ordering.
Idempotency
A property where doing the same operation twice has the same effect as doing it once (like pressing a floor button in a lift). Crucial because retried messages are common and you must not double-charge or double-apply them.
Latency
The time it takes for one message to travel from sender to receiver. Always greater than zero, and unpredictable across a real network.
Linearizability
The strongest single-object consistency model: the system behaves as if there were one copy of the data and every operation happened instantly at one point in time. The closest a distributed system gets to "feels like one machine."
Logical clock
A counter used to order events without relying on wall-clock time. It tracks order, not real seconds. Lamport clocks and vector clocks are the two main kinds.
Lamport clock
A single counter per machine that increments on each event and travels with messages. It guarantees that if A happened before B, A's number is smaller — but a smaller number does not prove causality.
Linearizable vs serializable
Linearizable is about a single object being up to date in real time; serializable is about multi-step transactions appearing to run one at a time. They are different guarantees often confused.
Monotonic read
A guarantee that once you have seen a newer value, you will never later see an older one — time only moves forward from your point of view.
Node
A single machine (server, process, or instance) participating in the system. The basic building block; "distributed" means more than one node.
NTP (Network Time Protocol)
The standard service that nudges machine clocks toward real time over the network. Helpful, but it only narrows clock skew — it never eliminates it.
PACELC
An extension of CAP: if there is a Partition, trade Availability vs Consistency; Else (normal operation) trade Latency vs Consistency. It reminds you the consistency cost exists even when nothing is broken.
Partition (network partition)
A break in the network that splits the nodes into groups that cannot talk to each other, even though each group is still running. The scenario CAP is about.
Partition tolerance
The ability of the system to keep operating despite a network partition. On real networks this is non-negotiable, which is why CAP is really a choice between C and A.
Quorum
A minimum number of nodes that must agree before an action counts (often "more than half"). Quorums let a system make safe decisions without hearing from every node.
Read-your-writes
A guarantee that after you make a change, you will always see your own change on subsequent reads — even if other users do not yet.
Replica
A copy of the data kept on another node. Replicas give speed and fault tolerance, but keeping them in agreement is the source of consistency problems.
Replication
The act of copying data across multiple nodes so it survives failures and can be read from nearby. The more copies, the harder consistency becomes.
Stale read
Reading a value that has since been updated elsewhere — you got an old answer because the newer write has not reached your replica yet.
Strong consistency
An umbrella term for models (like linearizability) where reads always reflect the latest write, so the system feels like a single up-to-date copy.
Throughput
How much work the system handles per unit of time (e.g. requests per second). Distinct from latency: a system can be high-throughput yet still slow per request.
Timestamp
A recorded time (or counter value) attached to an event. Wall-clock timestamps are unreliable for ordering across machines because of skew; logical timestamps are designed for it.
Total order
An arrangement where every pair of events has a definite "this one is first" answer. Lamport clocks can produce one (with tie-breaking); real causality only gives a partial order.
Vector clock
A list of counters — one per node — carried with each event. By comparing two vectors you can tell whether one event happened before the other, or whether they were concurrent. More information than a Lamport clock.
Wall-clock time
The ordinary time-of-day a machine reports (e.g. 10:42:07). Convenient for humans but untrustworthy for ordering events across machines because of skew and drift.
Write conflict
When two nodes change the same piece of data independently and the system must later decide which change wins (or how to merge them). Common under eventual consistency.

Continue reading