We’ve just kicked off an internal distributed-systems seminar. Our inaugral paper was Lamport’s classic “Time, Clocks and the Ordering of Events in a Distributed System”. I remembered the paper fondly, but hadn’t looked back it it for more than a decade.
Going back and reading it carefully was a fun experience, and I realized that I hadn’t really understood the paper properly at the time. Maybe that’s not surprising – Lamport himself likes to complain that no one really understands the main contribution of the paper, which from his perspective is the introduction of state machine replication.
(As a side note, If you come to think that everyone has missed what you believe to be the main point of your paper, you should consider the question of whether the paper should have been written differently.)
But the bit that surprised me is something else, which is the somewhat circular way that Lamport motivates the key results in the paper.
A little background. In this paper, Lamport suggests that one should care about
the causal-partial-ordering of events in a distributed system, where roughly
speaking, e > e'
if there is a patch of messages leading from e'
to e
Lamport then describes a simple algorithm for building causally consistent
logical clocks. The key guarantee is that it produces a timestamp C(e)
for
every event e
that has the property that if e > e'
, then C(e) > C(e')
.
To motivate the utility of this clock, he then shows an algorithm for implementing a distributed mutual exclusion algorithm that uses it. On the face of it, this seems like a good argument, but if you look closely, there’s something circular here. In particular, consider Lamport’s mutual exclusion spec.
I. A process which has been granted the resources must release it before it can be granted to another process.
II. Different requests for the resource must be granted in the order in which they are made.
III. If every process which is granted the resource eventually releases it, then every request is eventually granted.
The fairness condition (II) is the interesting one. If you think about it, you’ll realize it’s not quite clear what he means by “the order in which they are made”. What ordering is he talking about? He can’t be talking about the real-time order, since if he was, the property would be impossible to achieve: imagine two hosts requesting access a nanosecond apart. It’s clear that without synchronized clocks or some other time-like information, there’s no way for the system to know who went first, and so the outcome of the mutual exclusion algorithm can’t depend on that.
What Lamport means, it turns out, is that if request r
is causally after
request r'
, then r'
’s request will be granted first.
And that’s the circularity. It’s not terribly surprising that if causality is baked into your specification, then you’re going to have to use causality as part of your implementation. But that’s not really convincing of much on its own, and Lamport doesn’t attempt to motivate why causal fairness is important for a mutual exclusion algorithm.
None of this is really to take away from the paper. This paper, and all of Lamport’s work really, is carefully thought through, well argued, and written in a wonderfully precise manner. And the paper does indeed introduce foundational ideas (state machine replication being the most important) and a conceptual vocabulary that was very influential.
But it raises the point that when you go and read a paper, even a well-regarded classic, you should approach it with a thoughtful and skeptical eye.