One of the key decisions to make when designing a code review system is choosing the basic unit of code review. One approach common in many open-source projects is to be patch-centric, i.e. to make the reviewable unit a single patch. In a patch-centric world, code review is complete when every patch that went into the system has been read.
We instead decided to build tools that were diff-centric. In a diff-centric world, reviews aren’t done on single patches, but instead between a pairs of points in history. Code review is complete when a path of diffs has been completed, starting from a fully read revision, and ending in the revision to be approved. Both patch-centric and diff-centric approaches are reasonable, but we settled on diff-centrism for a couple of reasons:
- Merges complicate patch-based review. If you’re going to do code-review at a patch level, one question you need to answer is: how do you review merges? In Mercurial, for instance, merges are represented by changesets that have two parents, and it’s not quite clear how to read the resulting diffs – you could read the diffs from both parents, but this is largely a recapitulation of the changesets being merged. One might be tempted to simply not review merge nodes, since they typically don’t contain material changes. But merges aren’t always trivial, so you ignore them at your peril.
- Reading many small patches reduces the signal-to-noise ratio. People make coding mistakes. If you read every little patch that someone has committed, you will end up reading the outcome of decisions that were later reversed. Diff-centric code review allows you to skip over some of that noise. If a patch is committed that must later be backed-out, then someone who is reading patches will end up reading both. In a diff-centric world, a reviewer who reads a diff that crosses over both the original patch and the backout won’t have to review either.