In my previous post I wrote about Flambda, which is the single biggest feature coming to OCaml in this release. In this post, I’ll review the other features of 4.03 that caught my eye.

Inline records

Variants are my favorite thing about OCaml, and in this release, they’re getting better. You’ve always been able to define variants with multiple arguments, e.g.:

type shape =
  | Circle of float * float * float
  | Rect of float * float * float * float

But, as with this example, it can sometimes be a little hard to figure out what the meaning of the individual fields are, since they don’t have labels. We can make this better by replacing our multi-argument variants with single argument variants containing approriately named records, as follows.

type circle = { center_x: float; 
                center_y: float; 
                radius: float;
              }

type rect = { x_lo: float; 
              y_lo: float; 
              x_hi: float; 
              y_hi: float;
            }

type shape =
  | Circle of circle
  | Rect of rect

This works, but the separation of the record type from the variant definition is a little awkward. Beyond that, this approach imposes extra runtime costs. A multi-argument variant takes up just a single heap-allocated block, while a single argument variant containing a record takes two blocks.

With 4.03, we can have the best of both worlds by definining variants containing inline records. Here’s what they look like.

type shape =
  | Circle of { center_x: float; 
                center_y: float; 
                radius: float;
              }
  | Rect of { x_lo: float; 
              y_lo: float; 
              x_hi: float; 
              y_hi: float;
            }

And we can write code that uses these types as follows:

let area = function
  | Circle c -> 3.14159 *. c.radius *. c.radius
  | Rect r -> (r.x_hi -. r.x_lo) *. (r.y_hi -. r.y_lo)

Note, however, that the values r and c aren’t quite first-class. For example, we can’t use them away from the context of the match statement in which they’re found. So this function:

let maybe_rect = function Circle _ -> None | Rect r -> Some r

fails with the error This form is not allowed as the type of the inlined record could escape.

Even with this complexity, inlined records are really convenient.

Another advantage of inline records is that they allow us to express variants with mutable fields. This is useful in a variety of circumstances. In Core, we fake this using Obj.magic for our mutable AVL tree implementation, and new we can remove these hacks. Similar uses of Obj.magic were removed from OCaml’s own imperative queue module in this release as well.

Uchar and result

A couple of useful types have been added to the standard library: Uchar.t and result. Uchar.t represents a unicode character, and is effectively just an int under the covers.

The result type is a type used for signaling success or failure, and has the following type.

type ('a,'b) result = Ok of 'a | Error of 'b

Both of these are in some sense trivial, but they’re there as a coordination point between different libraries. Lots of OCaml libraries have some analogue of result, and each OCaml Unicode library has its own character type. By including these in the standard library, it provides an easy point of agreement between different external libraries, while adding almost no complexity to the core distribution itself.

Better unboxing for externals

A small but valuable change: it’s now possible to write C bindings through which one can pass unboxed versions of a number of different types, including floats, Int64’s, Int32’s and Nativeint’s. This was previously possible only for calls that took only floats and returned a float. This makes it easier to write efficient, zero-allocation C bindings in a wider variety of cases.

GC Latency improvements

I talked about most of this here, so I won’t go into great detail now. But the news is that the changes that Damien Doligez did during his stay at Jane Street have finally made it upstream.

Ephemerons

To a first approximation, you can think of a GC as a machine that determines what memory can be reclaimed by figuring out what data is reachable, and then reclaiming everything else. The simplest notion of reachability counts everything reachable by following pointers, starting at the GC roots, which are mostly just the values on the call stack.

This notion of reachability is a bit too restrictive, however. In particular, in some circumstances you want to keep pointers to objects without preventing those objects from being reclaimed. This is useful for some kinds of caching, where you want to cache previously computed values for as long as they’re referenced, but no longer.

OCaml has long had an answer to this problem, which is the notion of a weak pointer. Weak pointers are references that aren’t counted when computing reachability. When an object that is pointed to by a weak reference is collected, the weak reference itself is nulled out.

Weak references are good enough for some purposes (e.g. hash-consing), but they can be awkward for many use cases. One basic use-case for which weak references are an awkward fit is memoizing a function, where one wants to keep entries in the table as long as the input to the function in question is still reachable.

You could imagine just keeping the key of the hash-table in a weak pointer, and then using a finalizer to remove the entry in the table once the key gets collected. But this fails if there is a reference from the output of the function back to the key, which is common enough.

Ephemerons were proposed back in 97 by Barry Hayes to solve just this problem. The idea is pretty simple: an ephemeron has multiple keys and a single data entry. When determining whether the keys are alive, one doesn’t count references from values that are reachable only via the ephemeron, so the references from the data back to the key don’t keep the key alive. Also, once any of the ephemerons keys are collected, the key and the data element are removed immediately. Weak pointers are now just a special case of ephemerons – a weak pointer is effectively just an epehmeron with no data.

Ephemerons don’t come up that much, but if you need to build a memoization table, ephemerons make the task simpler and the result less leak-prone.

Licenses, Github and OCamlbuild

A few organizational changes are landing in 4.03 that I think are worth noting. First, OCaml development is officially moving from Subversion to Git, with Github as the primary coordination point for OCaml development.

Second, OCaml’s license is changing from the somewhat awkward and rarely used QPL, to LGPLv2 with a linking exception. The latter had already been used for various libraries distributed with the compiler. But, acknowledging the fact that more and more of the guts of the compiler are being used as libraries for a variety of reasons, it was recently decided to move everything to LGPLv2.

And finally, ocamlbuild is being moved out of the core OCaml distribution. This is one in a series of decisions to break out software that was previously bundled together with OCaml. This allows the core team to focus more on the core compiler itself, and makes the exported projects freer to make changes on whatever schedule they see fit.

I think keeping the OCaml distribution lean is an excellent strategic decision, allowing the core team to focus on the compiler itself, and allowing tooling and libraries to be worked on independently.

Together, these are all small changes. But they’re part of a trend towards making OCaml development simpler and more agile.

What’s missing

There are a number of much anticipated features that haven’t made it into this release. In particular, the multicore GC, which at one point had been expected to land in 4.03, has been pushed back, likely to 4.04. Algebraic effects, which are in part motivated by the multicore GC, also didn’t make it.

And finally, modular implicits, which are intended to provide typeclass-like functionality for OCaml, are not in this release either. That said, they’re being actively worked on, and I’m expecting there will be more news about those in the next six months.

All in, it’s been a pleasure watching this release take shape. And just as important as what got in is what didn’t. Despite a greatly increased rate of change, the process is quite conservative – nonsense features just don’t make it in. My gratitude goes out to the core team for their safekeeping of the language.