The perils of polymorphic compare

I have a love-hate relationship with OCaml’s polymorphic comparison functions, which I think I share with a lot of people who use the language. For those who don’t know what polymorphic compare, a quick explanation. OCaml has a collection of functions for comparing values which, magically enough, can be used on virtually any data type. For example, there is a comparison function in Pervasives with the following signature:

val (>=) : 'a -> 'a -> bool

If you stop to think about it, this is a pretty surprising little function. How does one write a single function to compare ints or floats or lists or any random type cobbled together out of records, tuples, and variants? It turns out that you can’t write such a function on your own, at least not in OCaml. Special compiler magic is required. There is a function included with the runtime called

compare_val which compares values by descending their structure recursively following fairly simple rules. For example, records are compared field by field, moving on to the next field only to break a tie with the previous field. Variants are compared first by their tags, and then, if the tags are equal, descending recursively to the content. compare_val operates directly on the low-level representation of an OCaml value, completely ignoring the type system.

compare_val is undeniably convenient. One lovely thing this allows is the creation of polymorphic set and map types (a trick which is strangely not used in the standard library). And full comparisons aside, a simple built-in structural equality test is useful in a wide variety of contexts.

But compare_val has its complications as well. The fact that it ignores the type system means it crosses all abstraction boundaries. A classic example is that of a set type. The physical structure of the binary tree used to represent a set is generally not uniquely determined by the contents of the set in question. That means that structural equality on sets is (surprisingly) not equivalent to ordinary set equality. Thus, you get the following surprising situation.

open Core.Std

let s1 = Set.of_list [1;2;3]
let s2 = Set.of_list [2;1;3]
let () =
  (* both assertions pass *)
  assert (Set.equal s1 s2);
  assert (s1 <> s2)

Sadly, there’s no way of convincing compare_val to use Set.equal to compare the two sets, rather than descending into the structure of the values.

Another problem with compare_val is that it sometimes throws exceptions. If, in its traversal of a data structure, it encounters a function or a wrapped-up C object that doesn’t support comparison, or an object, then it will just throw an exception. This leads to a source of runtime errors that can be hard to stamp out.

compare_val is also fairly slow. The OCaml compiler will optimize away calls to compare_val where it can, but it can’t do it in all cases. It’s a big enough issue that when I think of things to do to optimize an OCaml program, one of the first things on the list is to stamp out unnecessary invocations of compare_val.

Given the tradeoffs involved with polymorphic compare, what should one do about it? More on that, in a later post.

The perils of polymorphic compare

Signals & Threads Podcast

Featured Tech Talk

Featured Reads

Tags

RSS

Jane Street Open Source

Join Our Team

The perils of polymorphic compare

Subscribe to Email Updates

Signals & Threads Podcast

Featured Tech Talk

Featured Reads

Tags

RSS

Jane Street Open Source

Join Our Team