Glossary of Terms
By Pritesh Yadav 7 min read —
- Asynchronous network
- A network where messages can take any amount of time to arrive, with no guaranteed upper limit. The internet behaves this way. It is the root cause of most distributed-systems difficulty, because you can never tell a slow message apart from a lost one.
- Availability
- The promise that the system answers every request it receives (even if the answer might be slightly out of date). A system is "highly available" when it almost never refuses to respond.
- BASE
- An informal counterpart to ACID, standing for "Basically Available, Soft state, Eventually consistent." Describes systems that favour staying up and converging over time rather than being perfectly consistent at every instant.
- CAP theorem
- The rule that when the network between your machines breaks (a partition), a distributed system must choose between staying Consistent (everyone sees the same data) and staying Available (everyone still gets an answer). You cannot have both during the break.
- Causality
- The relationship where one event could have influenced another — event A "happened before" event B in a way that B might depend on A. The opposite is two events being concurrent (neither could have affected the other).
- Clock skew
- The difference between two machines' clocks at a single moment. Even well-synchronised servers are off by a few milliseconds, which is enough to order events wrongly.
- Clock drift
- The gradual speeding-up or slowing-down of a single machine's clock over time, so it slowly wanders away from real time until something corrects it.
- Concurrent events
- Two events where neither happened before the other — they ran independently and neither could have caused the other. Vector clocks can detect this; Lamport clocks cannot.
- Consensus
- The problem of getting several machines to agree on a single value (for example, "who is the leader?" or "what is the next entry in the log?") even when some machines or messages fail.
- Consistency (in CAP)
- The promise that every read sees the most recent write — all machines show the same data at the same time. (Note: this is stricter than the "C" in ACID, which is a different idea.)
- Consistency model
- The specific promise a system makes about how fresh and ordered your data looks. Ranges from linearizability (always perfectly current) down to eventual consistency (correct eventually).
- Eventual consistency
- The weakest common promise: if writes stop, all copies of the data will eventually become identical. It does not say when, and reads in the meantime may be stale.
- Fallacies of distributed computing
- A famous list of eight false assumptions beginners make (e.g. "the network is reliable", "latency is zero"). Believing them quietly is how systems break.
- Fault
- Something going wrong in one component — a disk dies, a message is dropped, a machine pauses. A fault is local and expected; good systems tolerate faults.
- Failure
- When the system as a whole stops doing its job. A failure is a fault that was not contained. The goal of fault tolerance is to stop faults from becoming failures.
- Happens-before
- A relationship (written A → B) meaning A could have caused or influenced B: either they happened on the same machine in order, or A was the sending of a message that B received. The foundation of logical ordering.
- Idempotency
- A property where doing the same operation twice has the same effect as doing it once (like pressing a floor button in a lift). Crucial because retried messages are common and you must not double-charge or double-apply them.
- Latency
- The time it takes for one message to travel from sender to receiver. Always greater than zero, and unpredictable across a real network.
- Linearizability
- The strongest single-object consistency model: the system behaves as if there were one copy of the data and every operation happened instantly at one point in time. The closest a distributed system gets to "feels like one machine."
- Logical clock
- A counter used to order events without relying on wall-clock time. It tracks order, not real seconds. Lamport clocks and vector clocks are the two main kinds.
- Lamport clock
- A single counter per machine that increments on each event and travels with messages. It guarantees that if A happened before B, A's number is smaller — but a smaller number does not prove causality.
- Linearizable vs serializable
- Linearizable is about a single object being up to date in real time; serializable is about multi-step transactions appearing to run one at a time. They are different guarantees often confused.
- Monotonic read
- A guarantee that once you have seen a newer value, you will never later see an older one — time only moves forward from your point of view.
- Node
- A single machine (server, process, or instance) participating in the system. The basic building block; "distributed" means more than one node.
- NTP (Network Time Protocol)
- The standard service that nudges machine clocks toward real time over the network. Helpful, but it only narrows clock skew — it never eliminates it.
- PACELC
- An extension of CAP: if there is a Partition, trade Availability vs Consistency; Else (normal operation) trade Latency vs Consistency. It reminds you the consistency cost exists even when nothing is broken.
- Partition (network partition)
- A break in the network that splits the nodes into groups that cannot talk to each other, even though each group is still running. The scenario CAP is about.
- Partition tolerance
- The ability of the system to keep operating despite a network partition. On real networks this is non-negotiable, which is why CAP is really a choice between C and A.
- Quorum
- A minimum number of nodes that must agree before an action counts (often "more than half"). Quorums let a system make safe decisions without hearing from every node.
- Read-your-writes
- A guarantee that after you make a change, you will always see your own change on subsequent reads — even if other users do not yet.
- Replica
- A copy of the data kept on another node. Replicas give speed and fault tolerance, but keeping them in agreement is the source of consistency problems.
- Replication
- The act of copying data across multiple nodes so it survives failures and can be read from nearby. The more copies, the harder consistency becomes.
- Stale read
- Reading a value that has since been updated elsewhere — you got an old answer because the newer write has not reached your replica yet.
- Strong consistency
- An umbrella term for models (like linearizability) where reads always reflect the latest write, so the system feels like a single up-to-date copy.
- Throughput
- How much work the system handles per unit of time (e.g. requests per second). Distinct from latency: a system can be high-throughput yet still slow per request.
- Timestamp
- A recorded time (or counter value) attached to an event. Wall-clock timestamps are unreliable for ordering across machines because of skew; logical timestamps are designed for it.
- Total order
- An arrangement where every pair of events has a definite "this one is first" answer. Lamport clocks can produce one (with tie-breaking); real causality only gives a partial order.
- Vector clock
- A list of counters — one per node — carried with each event. By comparing two vectors you can tell whether one event happened before the other, or whether they were concurrent. More information than a Lamport clock.
- Wall-clock time
- The ordinary time-of-day a machine reports (e.g. 10:42:07). Convenient for humans but untrustworthy for ordering events across machines because of skew and drift.
- Write conflict
- When two nodes change the same piece of data independently and the system must later decide which change wins (or how to merge them). Common under eventual consistency.