Proofs (and Refutations) using Z3

People often think of formal methods and theorem provers as forbidding tools, cool in theory but with a steep learning curve that makes them hard to use in real life. In this post, we’re going to describe a case we ran into recently where we were able to leverage theorem proving technology, Z3 in particular, to validate some real world engineering we were doing on the OCaml compiler. This post is aimed at readers interested in compilers, but assumes no familiarity with actual compiler development.

We’ll start by discussing the kind of compiler optimizations we’re interested in and why they might be difficult to get right, in particular when applied to floating-point computations. We’ll then show how we used Z3 to review some optimizations we were considering adding to the OCaml compiler, and finding a subtle bug in one of them.

Constant Folding

Compilers usually perform a lot of optimizations to ensure that the programs written by developers run as fast as possible. Among these optimizations, a common one is constant folding, which allows the compiler to rewrite an expression such as 2 * 3 to 6, performing the computation at compile time rather than at run time. Of course, developers don’t usually write expression like 2 * 3, but such examples show up anyway, because other compiler optimizations can create them. e.g., by replacing a symbolic constant with its actual value. Similarly, constant folding can in turn trigger other optimizations, like determining that a code fragment is unreachable, which leads to it being deleted.

A natural question is, of course, the applicability of such optimizations. In the previous example, a multiplication can be computed at compile time not only if we know the values of both operands, but also by applying basic mathematical knowledge such as:

the presence of a neutral element: 1 * expr can be rewritten to expr;
the presence of an absorbing element: 0 * expr can be rewritten to 0;
the equivalence with another operation: -1 * expr can be rewritten to -expr (the latter being both shorter and faster on most CPUs).

A common pitfall for compiler developers is to apply mathematical knowledge to objects having slightly different properties. Indeed, while the rewrites above are trivially correct for mathematical integers, one has to keep in mind that what we call integers on computers is typically quite different from what we call integers in mathematics. For instance, in most programming languages, integer values are represented using a fixed number of bits which leads to a bounded range of possible integers.

As an example, if a signed integer is represented using 8 bits, then the range of possible values is usually -128..127. In this setting, is it correct to rewrite -1 * expr to -expr? It turns out that it is, but only because the usual way overflow is handled (at least when integer values are encoded using the two’s complement representation) is such that -(-128) = -128. The mathematical intuition was correct, but for not-so-obvious reasons. Floating-point numbers are unfortunately even trickier to manipulate.

Floating-point Computations

Floating-point numbers are an approximate representation of real numbers. Floating-point computations are notoriously error-prone, mostly because some basic mathematical properties over real numbers do not hold for floating-point values. For instance, floating-point addition is not associative i.e. given three floating-point values x, y and z, the following computations do not always yield the same result:

x + (y + z);
(x + y) + z.

To complicate things even further, some floating-point subtleties have no direct equivalent in mathematics, such as the fact that there are two different values for zero: a positive one and a negative one. The values will be considered equal when compared but may yield different results when used in computations. For instance, 1.0 / +0.0 is equal to positive infinity while 1.0 / -0.0 is equal to negative infinity. (Which, incidentally, means it is not safe to replace a floating-point value with another one that is equal according to floating-point comparison.)

Compiler optimizations are expected to transform programs while retaining the same behavior. As a consequence, a number of compilers either do not perform any optimization on floating-point computations, or provide means to specifically disable such optimizations. The solution we discuss in this post is the restriction to optimizations that always rewrite to computations producing the same result.

While the previous sentence seems to be perfectly reasonable, it is missing practical details: what do we mean by “always” and “same”? The latter is easy to define: we regard two floating-point values as identical if and only if they have the very same bit pattern according to the IEEE 754 specification (this ensures that they could not be discriminated by any operation). The former is considerably trickier to enforce because all possible combinations of values have to be considered, and ideally all possible rounding modes. A rounding mode determines which value should be chosen when the exact result of an operation cannot be represented (because floating-point values are only approximations).

Automated Theorem Proving to the Rescue

The traditional way to demonstrate that a property is always true is to write a mathematical proof. However, writing a formal proof is hard work, so we are going to take an alternate approach, which is to use a tool in order to automate the process of doing the proof.

A SAT solver is a tool able to solve a boolean SATisfiability problem, i.e. to determine whether, given a logical formula such as a ∧ (b ∨ c) ∧ ¬ c ∧ ¬ (b ∧ d), there exists a mapping from the formula variables to truth values that evaluates the formula to true.

SMT stands for Satisfiability Modulo Theories, where the theories act as a built-in knowledge about various kinds of structures such as integers or arrays. Thus, accepting formulas such as (a > 10) ∧ (b ∨ (c = 2)) ∧ ¬ (b ∧ (d[c] < 0)). The built-in knowledge of SMT solvers makes them much more efficient than SAT solvers because in a number of situations they can leverage properties from a given theory in order to reduce the search space.

The basic use of an SMT solver consists of declaring various entities and expressing constraints over those entities before asking the solver whether the constraints can be satisfied (i.e. can hold simultaneously). If the constraints can be satisfied, the solver can usually produce a model, that is a mapping from entities to values that satisfy the constraints.

Z3 is an SMT solver that comes with support for both integer values (through bit vectors of custom sizes) and floating-point values (using the IEEE 754 representation). Assuming we want to check whether the positive zero is a right neutral element for addition, we would submit the following query to Z3:

(set-logic QF_FP)

(declare-const x (_ FloatingPoint 11 53))
(define-fun z () (_ FloatingPoint 11 53) (_ +zero 11 53))
(declare-const r RoundingMode)

(assert (not (= (fp.add r x z) x)))

(check-sat)
(get-model)

Besides the first line that sets the theory to be used, the file is relatively straightforward:

the declare-const/define-fun lines declare the various entities to be used later on: an unknown floating-point value, an alias for the positive zero, and an unknown rounding mode (the values 11 and 53 are simply the numbers of bits used for respectively the exponent and the mantissa of a floating-point value);
the assert line encodes the formula we want to check i.e. x + +0 = x;
the last two lines ask Z3 to check the satisfiability of the formula and to output the model if any.

It is noteworthy that we actually encode the negation of the property we want to check. As a consequence, satisfiability means that the property does not hold, and the model is actually a counterexample. If we get a counterexample, then we can double-check that Z3 is right. Otherwise, we basically have to trust both Z3 itself and our use of the tool.

Z3 answers the previous query with the following:

sat
(model
  (define-fun x () (_ FloatingPoint 11 53)
    (_ -zero 11 53))
  (define-fun r () RoundingMode
   roundNearestTiesToEven)
)

On the first line, sat means that the assertion is satisfied. The other lines simply present a model (i.e. a counterexample for us) where x is bound to negative zero. The suspicious Z3 user can then feed the values to an OCaml toplevel:

let x = -0.
and z = +0.
and bits = Int64.bits_of_float in
bits (x +. z), bits x;;

whose output indeed shows two different values (0 and -9223372036854775808). We should hence not rewrite x + +0 to x.

Should Z3 have answered that there is no model for a given request, we would have been able to ask it to produce a proof. However, the proof establishing that there is no model for (assert (not (= (fp.mul r x one) x))) (i.e. that the x * 1 ⇝ x rewrite is correct) has more than 100,000 lines. This means that, in practice, such a proof would only be passed to another tool and thus only move the trust issue to another piece of code…

Upcoming flambda 2.0

The previous example has not been chosen at random. The x + +0 ⇝ x rewrite was for a (very) short time part of the development version of the compiler, and then rapidly removed once Z3 proved its invalidity. Possibly saving us a painful debugging session later on.

Using Z3, we have checked that all the simplifications over integer and floating-point values (involving arithmetic, bitwise and shifting operators) in the upcoming flambda 2.0 variant of the OCaml compiler are correct. More, these simplifications are correct for all integer sizes (OCaml defining 4 kinds of integers) and all rounding modes (OCaml offers no means to tweak the rounding mode, but it could be modified through an external function call).

Eventually, we plan to use a ppx extension combined to a simple DSL in order to be able to express the properties underlying the various rewrites as mere attributes of the OCaml code actually performing the rewrites (hence ensuring that the checks and the code are kept synchronized). The simplifications for floating-point multiplication when only one operand is known would look like:

...
| Mul ->
  if F.equal this_side F.one then
    The_other_side
    [@z3 check_float_binary_neutral `Mul (+1.0) `Right]
    [@z3 check_float_binary_neutral `Mul (+1.0) `Left]
  else if F.equal this_side F.minus_one then
    Float_negation_of_the_other_side
    [@z3 check_float_binary_opposite `Mul (-1.0) `Left]
    [@z3 check_float_binary_opposite `Mul (-1.0) `Right]
  else
    Cannot_simplify
...

Relying on a tool such as Z3 to check whether candidate rewrites are correct is invaluable. It does not only ensure that we do not introduce (possibly subtle and hard-to-reproduce) bugs, but also paves the way to more complex rewrites. Indeed, the safety net acts as an encouragement to experiment with ideas that could be deemed too risky otherwise.