People (myself included) like to say that OCaml isn’t really an optimizing compiler, that it has a pretty straight-ahead compilation strategy, and for the most part, you get what you it looks like you get when you write the code.
But it turns out, OCaml does a little more magic than I’d counted on. Consider the following code:
let f x y =
match x,y with
| (0,0) ->
true | _, _ -> false
I had thought that this actually allocated a tuple, and I was getting ready to push to try to get this fixed in the compiler. Before making a fool of myself, I thought I’d go and look at the generated assembly first, and lo and behold, I was wrong! The compiler does what one would hope and avoids the needless allocation. To see what the code looked like if I forced the allocation of a tuple, I changed the code to pass the tuple to a tuple-taking function.
let sum (x,y) = x + y
let f x y =
match x,y with
| (0,0) as pair -> ignore (sum pair); true
| _, _ -> false
I then generated the assembly, and looked again, only to discover that the
function had been inlined, thus defeating the need for allocation. So, I tried
again, this time adding a string constant to the body of sum
, which prevents
inlining (a deficiency that ocamlpro is working on).
let sum (x,y) =
ignore "z";
x + y
let f x y =
match x,y with
| (0,0) as pair -> ignore (sum pair); true
| _, _ -> false
I’d prevented the inlining, but there was still no allocation! Why? Well, it turns out that OCaml can optimize a tuple-taking function to get the elements of the tuple passed in via registers, which is exactly what happened. And again, the compiler realized that no allocation was required.
Finally, I was able to trigger an allocation by changing sum
to refer to the
tuplified form of its arguments explicitly:
let sum ((x,y) as _p) =
ignore "z";
x + y
And this finally triggers the allocation.
Anyway, none of this is that surprsing – indeed, other people at Jane Street knew perfectly well that OCaml did these optimizations. But it was a pleasant surprise for me nonetheless.