Now that OCaml 4.08 has been released, let’s have a look at what was accomplished, with a particular focus on how our plans for 4.08 fared. I’ll mostly focus on work that we in the Jane Street Tools & Compilers team were involved with, but we are just some of the contributors to the OCaml compiler, and I’ll have a quick look at the end of the post at some of the other work that went into 4.08.
In the end, 4.08 had a very late freeze, but here at Jane Street we mostly moved on to other things once the originally planned freeze date in October had passed. That meant planned features that weren’t ready by then were mostly put to one side and left for a later version.
Even for the things that were part of “our” plan, the work to make that plan happen came from both inside and outside of Jane Street. And not all of that work is writing code! In particular, we owe thanks to all the people in the larger OCaml development team who gave us feedback on our proposals, reviewed our pull requests, and helped to get things merged upstream.
Planned work
Simple support for GADTs in or-patterns
This was merged upstream in PR#2110 following a series of prerequisite pull requests that changed technical details of how patterns are type-checked (PR#1745, PR#1748, PR#1909, PR#2317). In 4.08 it will be possible to write:
type 'a ty = Int : int ty | Bool : bool ty | String : string ty
let is_string : type a. a ty -> bool = function
| Int | Bool -> false
| String -> true
It still won’t be possible to write or-patterns whose cases rely on equations from the GADT, for example:
type 'a ty = Int1 : int ty | Int2 : int ty | String : string ty
let five : type a. a ty -> a = function
| Int1 | Int2 -> 5
| String -> "five"
is still rejected in 4.08. Special thanks to Jacques Garrigue for reviewing this work.
Shadowing of items from include
This was merged upstream as PR#1892. In 4.08 it will now be possible to write:
include Foo
module Bar = struct ... end
when Foo
contains a Bar
module, which is much easier than the old way:
include (Foo : module type of struct include Foo end with module Bar := Foo.Bar)
module Bar = struct ... end
Private structure and signature items
In PR#2016 we proposed supporting having items in signatures and structures that were not included in the resulting module or module type, using the syntax:
private type t = int list
private module L = List
An existing pull request, PR#1506 by Runhang Li and Jeremy Yallop at OCaml Labs, had implemented an alternative feature that can be used to handle similar cases. It added support for using arbitrary module expressions in open statements. For instance, the above example could be expressed as:
open struct
type t = int list
module L = List
end
There was some debate about whether both these features should be supported, or if only one should be included in the language. In the end it was decided that the open feature made more sense in structures, but the private feature made more sense in signatures. It was also decided to give a different syntax to the private feature.
Based on these discussions we wrote PR#2122, which re-implemented the private feature to work only in signatures and use this syntax:
type t := int list
module L := List
reminiscent of destructive substitution. Then we wrote PR#2147, which re-implemented the open feature to restrict its use in signatures and to share some implementation with PR#2122. These were both merged and will appear in 4.08.
Improve type propagation in lets
We had hoped to change the order of type-checking the components of let expressions, so that:
let p : t = e
would be checked as t
then e
then p
, rather than the current order
of t
then p
then e
. This matters for order-dependent features such
as constructor disambiguation.
Unfortunately, this turned out to be more work than we expected. The existing code for typing let only has to deal with passing so-called principal type information – in this case information coming from type annotations rather than from inferring the types of expressions – to its patterns. The code for typing match knows how to do this, but its not easy to extract the relevant parts so that they can be shared.
We decided to leave this change for now until we have time to make the more invasive changes required.
Strengthening the module system
We had hoped to add a notion of transparent ascription to the module language, which is an operation:
module M = (N <: S)
that restricts M
to the elements of the module type S
, but it is
still known to be equal to N
. This would allow us to keep more type
and module equations in the module types produced in various parts of
the type system.
Unfortunately, none of the user-visible parts of this work made it into 4.08. Some large internal changes to support this work did get in:
- PR#1610 which removes positions from paths was merged
- PR#2127 which refactors how the type environment handles looking up values but still needs some work before being merged.
The main piece of work that underpins all the proposed changes is the addition of transparent ascriptions. This has proved fairly awkward to implement in a satisfying way, which has prompted both the above pull requests that try to tidy up some parts of the type system that were making the implementation awkward. The work is probably 80% done now, so it should be relatively easy to get it into a later version of OCaml once we start working on it again.
Flambda
Work towards making flambda classic mode the default compilation mode
We tested and benchmarked -Oclassic mode on 4.07. We were happy with the compile-time performance and correctness, but there were some situations where classic mode produced worse code than the non-flambda compiler. In particular, the handling of recursive functions was not as good. We decided that it would be better to wait for the work on improving the inlining heuristics for recursive functions, as that would allow us to fix this issue properly.
Since work has also been progressing quickly on “Flambda 2.0”, which essentially has a completely new version of classic mode, we’re no longer planning to push for having the existing classic mode as the default.
Improved inlining heuristics for recursion
Luke Maurer’s internship work on the inlining heuristics for recursion is still waiting on some improvements that are needed before it can be upstreamed. We’re now hoping to get that into OCaml 4.10.
Pierre Oechsel’s internship work on improving the stability of the inlining heuristics, and improving the support for displaying the results of these heuristics to uses, is based on top of Luke Maurer’s branch, so it is also delayed until at least OCaml 4.10.
Improved DWARF and GDB support
Some parts of the DWARF support were merged into 4.08, but then we decided to rewrite some of it to produce better behaviour in gdb. The rewritten patches have been submitted as pull requests (PR#2280, PR#2281, PR#2286, PR#2290, PR#2291, PR#2292, PR#2294, PR#2300, PR#2303, PR#2305, PR#2308, PR#2316, PR#8614), some of which have been merged. However, there has been some disagreement with upstream about the scale of some of these changes vs. their benefit. Addressing these concerns will probably require significant work, so we are going to put the gdb work to one side for now until we have enough spare cycles to get things into a state that is acceptable upstream.
Move the parser to Menhir
PR#292 by Gabriel Scherer at INRIA, Nicolás Ojeda Bär at Lexifi and Frédéric Bour at Facebook, which we helped to test and review, was merged into 4.08. It replaces the parser with one based on the Menhir parser generator, but it does not yet take advantage of Menhir’s advanced error handling: so the syntax errors remain as uninformative as before.
Add unsigned integer operations
We helped a little with getting Nicolás Ojeda Bär’s PR#1458 merged. It added unsigned integer operations to the standard library’s Int32, Int64 and Nativeint modules. The initial implementation uses OCaml implementations of these operations. We also wrote code generation for implementing the operations in native code but this has not yet been upstreamed.
Unplanned work
In addition to all the work we did trying to implement our planned features, we also did a lot of work that was, for one reason or another, not on our original plan.
Monadic let operators
PR#1947 adds support for “monadic” let operators to the language. These essentially bring let%bind and let%map from ppx_let to the language itself. This was implemented somewhat on a whim, and quickly devolved into a very long discussion about syntax. Despite this it did make it into OCaml 4.08, so it is now possible to write things like:
let ( let* ) o f =
match o with
| None -> None
| Some x -> f x
let return x = Some x
let find_and_sum tbl k1 k2 =
let* x1 = Hashtbl.find_opt tbl k1 in
let* x2 = Hashtbl.find_opt tbl k2 in
return (x1 + x2)
Fixing “levels” and “scopes”
In OCaml 4.07.0 we refactored some important parts of how GADTs are implemented. This allowed us to implement disambiguation for GADT constructors. However, one of our changes introduced a bug visible in the reported issues PR#7822, PR#7833 and PR#7835 as well as an unreported soundness issue that affected enough of our code that we rolled back to OCaml 4.06.1 internally.
To get things back into a safe state we reverted a small part of the change from 4.07.0 in PR#1997 along with a couple of other small fixes. This was released in OCaml 4.07.1.
The underlying cause for this bug was an awkward invariant around the representation of bound identifiers in the type-checker. This invariant was needed to correctly implement the “The type constructor foo would escape its scope” error. The need for this invariant came from using a single number (the “stamp”) to serve two different roles. In PR#1980 we split this number into two numbers (a “stamp” and a “scope”), eliminating this invariant and hopefully preventing similar bugs from appearing in the future.
Refactor lookup functions
When implementing transparent ascription, we needed to make some changes to the functions that lookup identifiers from the type environment. These functions were a bit convoluted so in PR#2127 we rewrote how these functions work to make things clearer. This patch has not yet been merged upstream.
Make Dynlink sound
OCaml’s Dynlink
module, which provides support for dynamic linking,
has never done enough checking to ensure that loaded modules can safely
be linked into the program. This has produced many bug reports:
PR#4208,
PR#4229,
PR#4839,
PR#6462,
PR#6957,
PR#6950, etc.
We finally fixed these issues in
PR#106 which rewrote all of
the module tracking done by Dynlink
to ensure that modules are only
loaded if it is safe to do so. Unfortunately, the checks we implemented
were a little overzealous and broke Coq’s plugin mechanism
(PR#7876).
PR#2176 fixed that by
making the checks a little more accurate. These pull requests were
merged into 4.08.
Change representation of class signatures
As part of adding support for disambiguating GADTs in OCaml 4.07 we had to make a small change to how classes and object literals were type-checked. Unfortunately this introduced some bugs including PR#7894. These bugs are very similar to previous bugs in the same part of the type-checker such as PR#5498. We decided that changing the representation of class signatures within the type checker would resolve these kinds of bugs once and for all.
Whilst making this change we discovered a number of other issues in this part of the compiler, and the resulting pull request PR#8516 became quite involved. This patch has not yet been merged upstream.
Refactor the construction of the initial environment
In OCaml 4.07 the standard library was put into a single Stdlib
module. This Stdlib
module is opened by default so that, for example,
Stdlib.List.map
is still available as List.map. However, that meant
that modules from Stdlib would always shadow other modules with the same
name (PR#7841). To fix
this, PR#2041 changed how
external modules are represented in the type environment and how the
initial type environment is constructed. This fix was included in 4.08.
Exceptions under or-patterns
Back in 2015 we implemented support for having exception patterns within or-patterns (PR#305). This allows you to write things like:
let get (t : int option String.Table.t) (key : string) =
match Hashtbl.find_exn t key with
| Some x -> x
| None | exception Not_found -> 0
This was merged upstream in time for OCaml 4.03, but there were problems with its implementation (PR#7083) and it was reverted. Last year we finally found time to rewrite the implementation (PR#1568) and the feature was merged for OCaml 4.08.
Reproducible builds
As part of work on improving our build times via various forms of caching we’ve suddenly become very interested in getting OCaml builds to be fully reproducible. With that aim in mind we made a number of patches to the compiler: PR#1845, PR#1856, PR#1869, PR#1930. These were included in OCaml 4.08.
Work done outside of Jane Street
The Jane Street Tools & Compilers team are just some of the contributors to the OCaml compiler. A lot of work on the OCaml compiler is done outside of Jane Street. The Changes file includes a full list of everything that’s gone into OCaml 4.08. Highlights include…
New notion of “alerts” that generalizes deprecation warnings
In PR#1804 Alain Frisch, from LexiFi, added support for a new alert attribute:
val foo: int -> int
[@@alert unsafe "Please use bar instead!"]
where unsafe is just an arbitrary lowercase identifier. Any uses of foo would then produce a warning:
Alert unsafe: foo
Please use bar instead!
These warnings can then be turned on and off also using attributes:
let y = foo 5 [@alert "-unsafe"]
or using the command-line as
ocamlopt ... -alert -unsafe ...
The existing attribute [@deprecated "msg"]
is now just sugar for
[@alert deprecated "msg"]
.
New modules for the standard library
The changes to how the standard library is packaged in OCaml 4.07 made it much easier to add new modules to the standard library without breaking things. This prompted the addition a number of new modules to the standard library by Daniel Bünzli:
Fun
(PR#2129) containing functions for working with functionsBool
(PR#2010) containing functions for working with ~bool~sInt
(PR#2011) containing functions for working with ~int~sOption
(PR#1940) containing functions for working with ~option~sResult
(PR#1956) containing functions for working with ~result~s
Here at Jane Street everyone uses Core and Base, which have always had such luxuries, but for those of us who occasionally have to make do with the standard library – for example when working on the compiler itself – these are exciting additions.
Improved error messages
There were many pull requests, mostly from Florian Angeletti or Armaël Guéneau, to improve the quality of error messages from the type-checker: PR#1720, PR#1733, PR#1993, PR#1998, PR#2058, PR#2094, PR#2140, PR#6416, PR#1120.
Switching the configure system to autoconf
PR#2044, PR#2059, PR#2113, PR#2115 and PR#2139 by Sébastien Hinderer from INRIA replaced the old hand-rolled configure system with one based on autoconf. This should make it easier to maintain, and hopefully pave the way for much better cross-compilation support.