I have a love-hate relationship with OCaml’s polymorphic comparison functions,
which I think I share with a lot of people who use the language. For those who
don’t know what polymorphic compare, a quick explanation. OCaml has a collection
of functions for comparing values which, magically enough, can be used on
virtually any data type. For example, there is a comparison function in
Pervasives
with the following signature:
val (>=) : 'a -> 'a -> bool
If you stop to think about it, this is a pretty surprising little function. How does one write a single function to compare ints or floats or lists or any random type cobbled together out of records, tuples, and variants? It turns out that you can’t write such a function on your own, at least not in OCaml. Special compiler magic is required. There is a function included with the runtime called
compare_val
which compares values by descending their structure recursively
following fairly simple rules. For example, records are compared field by field,
moving on to the next field only to break a tie with the previous field.
Variants are compared first by their tags, and then, if the tags are equal,
descending recursively to the content. compare_val
operates directly on the
low-level representation of an OCaml value, completely ignoring the type system.
compare_val
is undeniably convenient. One lovely thing this allows is the
creation of polymorphic set and map types (a trick which is strangely not used
in the standard library). And full comparisons aside, a simple built-in
structural equality test is useful in a wide variety of contexts.
But compare_val
has its complications as well. The fact that it ignores the
type system means it crosses all abstraction boundaries. A classic example is
that of a set type. The physical structure of the binary tree used to represent
a set is generally not uniquely determined by the contents of the set in
question. That means that structural equality on sets is (surprisingly) not
equivalent to ordinary set equality. Thus, you get the following surprising
situation.
open Core.Std
let s1 = Set.of_list [1;2;3]
let s2 = Set.of_list [2;1;3]
let () =
(* both assertions pass *)
assert (Set.equal s1 s2);
assert (s1 <> s2)
Sadly, there’s no way of convincing compare_val
to use Set.equal
to compare
the two sets, rather than descending into the structure of the values.
Another problem with compare_val
is that it sometimes throws exceptions. If,
in its traversal of a data structure, it encounters a function or a wrapped-up C
object that doesn’t support comparison, or an object, then it will just throw an
exception. This leads to a source of runtime errors that can be hard to stamp
out.
compare_val
is also fairly slow. The OCaml compiler will optimize away calls
to compare_val
where it can, but it can’t do it in all cases. It’s a big
enough issue that when I think of things to do to optimize an OCaml program, one
of the first things on the list is to stamp out unnecessary invocations of
compare_val
.
Given the tradeoffs involved with polymorphic compare, what should one do about it? More on that, in a later post.