Time, Clocks & the Ordering of Events

By Pritesh Yadav 19 min read

This section is about one deceptively simple question: in what order did things happen? On a single computer this is trivial — you look at the clock, or you read the lines of a log file from top to bottom. But the moment your system spreads across many machines, "what happened first?" becomes one of the hardest and most important questions in all of distributed computing. This whole section is built on the single most-cited paper in the field: Leslie Lamport's "Time, Clocks, and the Ordering of Events in a Distributed System," published in Communications of the ACM, Volume 21, Issue 7, in July 1978. It later won the ACM's PODC Influential-Paper Award (2000) and is a large part of why Lamport received the Turing Award in 2013.

We will define every term as we go. By the end you will understand why wall-clock time is a trap, what the happens-before relation is, how Lamport logical clocks work step by step, and how all of this powers real systems like databases, version-control tools, and message queues.

3.1 First, some vocabulary

Let's nail down the basic words before anything else.

  • Process — a single running program on one machine. In this section, think of it as "one participant" in the system. We'll use P1, P2, P3 for three of them. (Lamport's paper assumes each process runs on its own machine and the processes talk only by sending messages — no shared memory.)
  • Event — a single thing that happens inside a process at one instant. Three kinds matter: an internal event (the process does some local work), a send event (the process sends a message), and a receive event (the process receives a message).
  • Message — data sent from one process to another. It takes some unknown, variable amount of time to arrive. It cannot arrive before it was sent (that's the one rock-solid rule of reality we get to keep).
  • Physical clock / wall-clock time — the actual time-of-day clock on a machine (e.g. 2026-06-21 14:03:07.182). Each machine has its own.
  • Logical clocknot a real clock. It is just a counter — a number that a process bumps up to track ordering, with no connection to real seconds. This is the star of the section.
Analogy: Imagine three people writing letters to each other from three different cities, with no phone and no shared calendar. Each person can only know: (a) the order in which they did their own things, and (b) that a letter they received must have been written before it arrived. They can never directly observe a global "now." A distributed system is exactly this — machines are the people, messages are the letters.

3.2 Why we can't just use the wall clock

The obvious idea is: put a timestamp from the system clock on every event, then sort all events by timestamp. On one machine this works. Across many machines it fails, for three reasons.

Reason 1: Clocks drift

Clock drift means a clock runs slightly fast or slightly slow, so it slowly wanders away from true time. Computer clocks are usually driven by a cheap quartz crystal that oscillates at a not-quite-perfect rate, and that rate changes with temperature and age. A typical quartz computer clock drifts by roughly one second every 11–12 days if left alone. Two machines that were perfectly aligned at boot can be seconds — or, over long uptime, minutes — apart.

Reason 2: Clocks are skewed

Clock skew is the difference between two clocks at one instant. Because every machine drifts at its own rate, at any given moment machine A might read 14:03:07.100 while machine B reads 14:03:07.230. That 130-millisecond gap is skew.

Reason 3: Even NTP doesn't fully fix it

NTP (Network Time Protocol) is the standard internet service machines use to ask time servers "what time is it?" and nudge their clocks toward agreement. NTP helps a lot, but it is not perfect: it can only estimate network delay, not eliminate it. Under good conditions NTP holds skew to about 10 milliseconds; in real-world data centers it is common to see 100–250 milliseconds of skew, and clocks can even jump backward when NTP makes a correction.

Common mistake: "If two events have wall-clock timestamps 5 ms apart, the earlier timestamp happened first." False. With 100+ ms of possible skew, event B can carry an earlier timestamp than event A even though A caused B. A timestamp comparison can flatly contradict causality.

Here is the killer scenario. Suppose P1 sends a message that causes something on P2. If P1's clock is running fast and P2's is running slow, the receive on P2 can be stamped with a time earlier than the send on P1 — which says the effect happened before the cause. That is nonsense, and it's exactly the kind of bug that corrupts logs, traces, and databases.

THE WALL-CLOCK TRAP (cause appears to happen after effect)

P1 (clock runs FAST)        P2 (clock runs SLOW)
   |                            |
14:03:07.300  send msg ------\  |
   |                          \ |
   |                           \-->  receive msg  14:03:07.250
   |                            |
                                 ^
  Sort by timestamp => 07.250 (receive) is BEFORE 07.300 (send)
  => the system "believes" the message was received before it was sent. WRONG.
Key takeaway: Physical clocks across machines disagree (skew) and wander (drift). You cannot reliably order events on different machines by comparing wall-clock timestamps. Lamport's insight: for ordering, we usually don't need real time at all — we need causality.

3.3 The "happens-before" relation (→)

Lamport's first big move is to throw away physical time and define ordering purely in terms of causality — what could possibly have influenced what. He calls this the happens-before relation and writes it with an arrow: a → b reads as "a happens before b."

The relation is defined by exactly three rules:

  1. Same-process rule: If a and b are events in the same process, and a comes before b in that process's local order, then a → b. (A process runs its own steps in a definite sequence; that order is known for free.)
  2. Send-before-receive rule: If a is the sending of a message and b is the receiving of that same message (in another process), then a → b. (A message can't be received before it was sent — reality guarantees this.)
  3. Transitivity rule: If a → b and b → c, then a → c. (Causality chains together.)

That's the whole definition. Notice it never mentions seconds, clocks, or time-of-day. It only uses local order and message send/receive — two things every process can actually observe.

When a → b, we say a causally precedes b: a could have influenced b. (It might not have actually influenced it, but it had the opportunity, because information could have flowed from a to b.)

Concurrent events

Now the crucial twist. What if neither a → b nor b → a holds? Then the two events are called concurrent, written a ∥ b. Concurrent does not mean "at the same wall-clock instant." It means causally unrelated — there is no chain of local steps and messages connecting them, so no information could have flowed either way. Neither could have affected the other.

This is why happens-before is only a partial order: a relation where some pairs are ordered (a → b) but other pairs (the concurrent ones) are simply not ordered relative to each other at all. Contrast that with a total order, where every pair is comparable. Real time is a total order; causality is only partial — and that's an honest reflection of reality, because two unrelated things on two machines genuinely have no "true" order.

SPACE-TIME DIAGRAM (time flows DOWNWARD; arrows = messages)

   P1            P2            P3
   |             |             |
   a ----------> |             |        a -> b  (message P1->P2)
   |             b             |
   |             | ----------> |        b -> c  (message P2->P3)
   d             |             c
   |             |             |
   |             e <---------- |        c -> e  (message P3->P2)
   |             |             |

Causal chains:  a -> b -> c -> e   (so a -> e by transitivity)
Concurrent:     d ∥ b , d ∥ c , d ∥ e  (no path connects d to them)
                 (P1's event 'd' never sent/received anything reaching them)
Analogy: Think of a family group chat. If Mom posts a message and Dad replies to it, Mom's post happened-before Dad's reply — his reply was caused by reading hers. But if Mom and your sister both post at "the same time" without seeing each other's message, those two posts are concurrent: neither caused the other, and arguing about which one was "really first" is meaningless. Happens-before captures only the orderings that are caused by actual information flow.
Common mistake: Reading a → b as "a definitely caused b." It means "a could have causally influenced b" — there exists a path of local steps and messages from a to b. It's about potential causality, not proven cause.

3.4 Lamport logical clocks: counters that respect causality

Happens-before is a beautiful idea, but to use it, each process needs a cheap, local way to stamp events so that the stamps respect the arrows. That mechanism is the Lamport logical clock.

The setup is simple. Every process keeps one integer counter, call it C (start it at 0). For an event a, the stamp it gets is written C(a). There are just two rules for updating the counter. In Lamport's paper these are the Implementation Rules IR1 and IR2:

  • IR1 — before each event, tick. A process increments its own counter by 1 before (or as it executes) each event it performs. So every internal event and every send event bumps the counter. C := C + 1.
  • IR2 — piggyback the timestamp on messages. When a process sends a message, it includes its current counter value t inside the message. When another process receives that message, it sets its own counter to C := max(C, t) + 1 — that is, take the larger of "my clock" and "the timestamp in the message," then add one.

The max in IR2 is the whole trick. It forces the receiver's clock to jump ahead of the sender's send-time, so the receive event always gets a strictly larger number than the send event. Causality is preserved by construction.

Step-by-step walkthrough

LAMPORT CLOCK VALUES  (number after each event = its timestamp C)
time flows DOWNWARD

   P1            P2            P3
  C=0           C=0           C=0
   |             |             |
 e1 [1]         |            f1 [1]      IR1: local tick
   |            g1 [1]         |
 e2 [2] -----.   |             |         send from P1 carries t=2
   |          \  |             |
   |           `> g2 [3]       |         receive: max(1,2)+1 = 3
   |             |             |
   |           g3 [4] ------.  |         send from P2 carries t=4
   |             |           \ |
   |             |            `> f2 [5]  receive: max(1,4)+1 = 5
   |             |             |
 e3 [3]          |             |         P1 keeps ticking on its own

Trace it slowly:

  1. P1 does e1: tick 0→1, so C(e1)=1.
  2. P1 does e2 (a send): tick 1→2, so C(e2)=2. The message carries t=2.
  3. P2 receives it as g2. Its clock was 1 (from g1). Apply IR2: max(1, 2) + 1 = 3. So C(g2)=3 — strictly bigger than the send's 2. Good: e2 → g2 and 2 < 3.
  4. P2 sends at g3: tick 3→4, message carries t=4.
  5. P3 receives at f2. Its clock was 1. IR2: max(1, 4) + 1 = 5. So C(f2)=5 — bigger than 4. Good.
Example: Notice P3's clock leapt from 1 straight to 5. It "skipped" 2, 3, and 4. That's fine and expected — logical clocks are not measuring time, they're only guaranteeing the order of causally-related events comes out right. Gaps and uneven jumps don't matter at all.

3.5 The Clock Condition — and the one-way street

The property all of this is engineered to guarantee is the Clock Condition:

Key takeaway — the Clock Condition: For any events a and b,   if a → b then C(a) < C(b).
In words: if a happens-before b, then a's logical timestamp is strictly smaller than b's. Causality is never violated by the numbers.

This follows directly from the two rules: IR1 makes timestamps increase within a process, and IR2's max(...) + 1 makes a receive strictly exceed its matching send. Chain those together (transitivity) and any causal path strictly increases the number every step.

But here is the single most important caveat in this entire topic — the part beginners get wrong:

Common mistake: Believing the reverse — that C(a) < C(b) implies a → b. It does not. The implication only goes one way. A smaller timestamp does not prove "happened before." Two concurrent events can have any timestamps at all — equal, or one smaller than the other — purely by accident of how their local counters happened to advance.

Look back at the diagram: e1 on P1 has timestamp 1, and f1 on P3 also has timestamp 1 — yet they are concurrent (no message connects them). And g1[1] on P2 has a smaller number than e2[2] on P1, but they're also concurrent — the smaller number tells you nothing about causality. So:

  • a → b  ⟹  C(a) < C(b)  ✓ (guaranteed)
  • C(a) < C(b)  ⟹  a → b  ✗ (NOT guaranteed)

Plainly: Lamport timestamps let you order causal events correctly, but they cannot detect concurrency — from two timestamps alone you can't tell whether one truly happened-before the other or whether they're unrelated. (This is precisely the gap that vector clocks, covered in a later section, were invented to close.)

3.6 Total ordering: breaking ties with process IDs

Happens-before is a partial order, but many real algorithms need a total order — a single global queue where every pair of events has a definite, agreed position, including concurrent ones. We don't need this order to be "true"; we just need every process to agree on the same order. Lamport gives a tiny trick to manufacture one.

Sort all events by their Lamport timestamp C. When two events have the same timestamp (which only happens for concurrent events — causal ones always differ), break the tie using an arbitrary but fixed total ordering of the processes — for instance, by process ID, lowest ID wins. Write this combined order with :

TIE-BREAKING:  order key = (Lamport timestamp, process id)

  event   C   pid          sorted "happened-before-ish" total order:
  -----  ---  ----         a ⇒ b  iff  C(a) < C(b)
   x      3    P1                         OR ( C(a) = C(b) AND pid(a) < pid(b) )
   y      3    P3
   z      4    P2          => final order:  x (3,P1) ⇒ y (3,P3) ⇒ z (4,P2)

The result is a total order (every pair is now comparable) that is consistent with happens-before: it never contradicts a real arrow, it just makes an arbitrary-but-deterministic choice for the concurrent pairs. Every process, fed the same events, computes the identical order. That shared agreement is exactly what we need to build coordination algorithms.

Analogy: Think of timestamping race finishers with a stopwatch that only shows whole seconds. Many runners may tie at "12 s." To still produce a single ranked finishing list, you break ties by bib number. The bib number is arbitrary and has nothing to do with who was truly faster — but it gives one consistent, reproducible ranking that everyone can agree on. Process ID is the bib number.

3.7 Why this matters: real uses

Consistent logs and event ordering

Any system that merges events from many machines into one timeline — distributed tracing, audit logs, event-sourced databases — needs a way to order them that doesn't lie about cause and effect. Lamport timestamps give a causally-honest ordering that wall-clock timestamps can't.

Distributed mutual exclusion (the paper's headline example)

Mutual exclusion means: only one process at a time may hold a shared resource (the "critical section") — like only one machine being allowed to write to a shared printer or a shared record. Lamport's paper uses his total order to build a fully distributed solution with no central coordinator:

  1. To request the resource, a process timestamps a REQUEST message and broadcasts it to everyone, and adds it to its own ordered request queue.
  2. Every receiver puts the request in its queue (ordered by the total order ) and replies with a timestamped acknowledgement.
  3. A process may enter the critical section only when (a) its own request sits at the front of its queue by the total order, and (b) it has heard a later-timestamped message from every other process (so it knows no earlier request can still be in flight).
  4. To release, it broadcasts a timestamped RELEASE that removes its request from everyone's queue.

Because every process orders the queue by the same total order, they all agree on whose turn is next — without any central lock manager. This is the original proof that logical time alone is enough to coordinate a distributed system.

Real-world intuition: databases and version control

  • Lamport timestamps in databases. Distributed databases such as Apache Cassandra attach logical/timestamp metadata to writes so that, when replicas later compare two versions of the same row, they can resolve "which write wins" deterministically (last-write-wins conflict resolution). The descendants of Lamport's idea — vector clocks — power Amazon DynamoDB and its ancestor Riak to detect when two updates were concurrent and must be reconciled rather than silently overwritten.
  • Version control = happens-before, made visible. In Git, every commit records the ID(s) of its parent commit(s). "Parent → child" is a happens-before relation. A linear history is a causal chain; two branches edited independently are concurrent events — and git merge is literally the act of reconciling two concurrent histories, exactly the situation Lamport clocks identify (and a merge commit with two parents is the join point). Git's commit graph is a real-world picture of a partial order.
  • Coordination services. Systems like etcd and ZooKeeper expose a monotonically increasing version/revision number on every change — a logical clock in spirit — so clients can reason about ordering without trusting wall clocks.
Key takeaway: Lamport's 1978 paper gave distributed systems their notion of "order" without a shared clock. Happens-before defines what's causally ordered; logical clocks turn that into comparable numbers obeying the Clock Condition; process-ID tie-breaking turns the partial order into a usable total order. These three ideas underpin replication, conflict resolution, distributed locking, and version control to this day.

Common mistakes & misconceptions

Common mistake: Trusting wall-clock timestamps to order events on different machines. Clock skew (often 100–250 ms in practice, even with NTP) and drift mean a later-timestamped event can actually have happened first — and an effect can be stamped earlier than its cause.
Common mistake: Thinking C(a) < C(b) proves a → b. The Clock Condition runs one way only. A smaller Lamport timestamp does not imply happens-before; the two events may be concurrent. Lamport clocks order causality but cannot detect concurrency (that's what vector clocks add).
Common mistake: Reading "concurrent" as "at the same instant." Concurrent means causally unrelated — no chain of messages/local steps connects them. Two events seconds apart in real time can still be concurrent.
Common mistake: Treating logical-clock numbers as elapsed time. They're abstract counters. They jump unevenly (a receive can leap from 1 to 5) and skip values; only their relative order for causal events carries meaning.
Common mistake: Forgetting to advance the clock on internal and send events (only updating on receive). IR1 says every event ticks the counter; skipping internal/send ticks breaks the strict-increase guarantee.
Common mistake: Believing the manufactured total order is the "real" order of concurrent events. It isn't — process-ID tie-breaking is arbitrary. Its only virtue is that every process computes the same one, which is all coordination needs.

Best practices

Best practice: Use logical clocks (or their descendants) — not wall-clock time — whenever correctness depends on ordering of events across machines, such as replication, conflict resolution, or distributed locking.
Best practice: Always piggyback the sender's timestamp inside the message and apply max(local, received) + 1 on receive. The max is what preserves causality; never just increment the receiver's own clock and ignore the message's stamp.
Best practice: If you must detect concurrency (e.g. to know when two writes conflict and need merging), reach for vector clocks, not plain Lamport clocks — Lamport timestamps cannot tell concurrent apart from causally-ordered.
Best practice: When you need a single global order over all events, break timestamp ties with a fixed rule (e.g. process ID) so every node deterministically computes the identical total order.
Best practice: Keep NTP running for human-readable timestamps and rough debugging, but never let business logic depend on cross-machine wall-clock comparisons being accurate to the millisecond.

Section summary

  • Leslie Lamport's 1978 paper (CACM 21:7) showed how to order events in a distributed system without a reliable shared clock.
  • Physical clocks drift (≈1 s every 11–12 days) and stay skewed (≈10 ms ideal, 100–250 ms typical even with NTP), so cross-machine wall-clock timestamps can violate cause-and-effect.
  • Happens-before () is defined by three rules: local order within a process, send-before-receive across processes, and transitivity — no clocks involved.
  • Events where neither a → b nor b → a are concurrent (causally unrelated); this makes happens-before a partial order.
  • Lamport logical clocks are per-process counters updated by IR1 (tick before every event) and IR2 (max(local, message) + 1 on receive).
  • They guarantee the Clock Condition: a → b ⟹ C(a) < C(b) — but not the reverse, so they can't detect concurrency.
  • Breaking timestamp ties with a fixed process ordering yields a deterministic total order that all nodes agree on and that never contradicts causality.
  • These ideas drive distributed mutual exclusion (the paper's example), consistent logs/tracing, conflict resolution in databases (Cassandra, DynamoDB/Riak via vector-clock descendants), version numbers in etcd/ZooKeeper, and the parent→child causal graph in Git.

Continue reading