Plans for OCaml 4.08

With the external release of OCaml 4.07.0 imminent, we in Jane Street’s Tools & Compilers group have been planning what we want to work on for inclusion in OCaml 4.08. These days OCaml uses (or at least attempts) a time-based release process with releases scheduled every 6 months. We’re trying to avoid rushing in changes at the last minute – as we’ve been prone to do in the past – so this list is restricted to things we could conceivably finish in the next 4-5 months.

This blog post is part of a drive towards making OCaml compiler development – both inside and outside of Jane Street – more transparent to the wider OCaml community. By its nature this post is technical and very OCaml-specific. It has no narrative or moral, no beginning, middle and end, it is just a list of language features. Hopefully however, for those of you interested in OCaml’s development, it is at least an interesting list.

Support for GADTs in or-patterns

Currently, you cannot use GADT constructors with or-patterns. For example:

# type 'a ty = Int : int ty | Bool : bool ty | String : string ty;;

type 'a ty = Int : int ty | Bool : bool ty | String : string ty

# let is_string : type a. a ty -> bool = function
    | Int | Bool -> false
    | String -> true;;

    Characters 54-57:
      | Int | Bool -> false
        ^^^
Error: This pattern matches values of type int ty
       but a pattern was expected which matches values of type a ty
       Type int is not compatible with type a

which forces you to duplicate the match case for the different constructors:

# let is_string : type a. a ty -> bool = function
    | Int -> false
    | Bool -> false
    | String -> true;;

val is_string : 'a ty -> bool = <fun>

Thomas Refis has a patch to allow GADTs to be matched inside of or-patterns, allowing the above code to type-check, and we would like to get that merged for 4.08.

Note that this patch does not allow type equations introduced by the constructors to be used in the match case. For example, one could imagine allowing the following code:

type ('a, 'b) ty_pair =
  | Int_int : (int, int) ty_pair
  | Int_bool : (int, bool) ty_pair
  | Bool_int : (bool, int) ty_pair
  | Bool_bool : (bool, bool) ty_pair

let fst_default : type a b. (a, b) ty_pair -> a = function
  | Int_int | Int_bool -> 0
  | Bool_int | Bool_bool -> false

since both Int_int and Int_bool add the same equation on a (a = int) which is sufficient to type-check the match case. However supporting this in the general case is very tricky. We have some ideas about how to do it but they won’t be ready for 4.08.

Improve type propagation in lets

Currently code like this fails to compile:

type t1 = T
type t2 = T
let foo () = (T : t1)
let T = foo ()

because the pattern T is type-checked before the expression foo ().

I think this was originally done this way so that code like:

type t1 = T
type t2 = T
let x : t1 = T

would work because the : t1 annotation was considered part of the pattern.

Recently the handling of : t1 changed so that it was copied onto both the pattern and the expression during. This means we don’t need to type-check the pattern first anymore, but also creates its own difficulties, especially for ppx writers. There are also some odd corner cases because polymorphic annotations on lets are treated differently.

We plan to change the AST to have an explicit representation for:

let pat : typ = exp

and change the order of type checking so that it goes: typ => exp => pat, which seems to be the most natural order.

Shadowing of items from “include”

Currently, to extend a module you must do something like:

include (Foo : module type of struct include Foo end with module Bar := Foo.Bar)
module Bar = struct
  include Foo.Bar
  let baz = 42
end

which is a bit of a mouthful. This is because you cannot define two modules with the same name within a single module. Whilst there are various reasons not to allow module shadowing, most of them don’t apply to modules that come from an include statement. So we would like to add the ability to shadow modules, types, etc. that have come from an include statement. This will allow the above to be written as:

include Foo
module Bar = struct
  include Foo.Bar
  let baz = 42
end

Private structure items

It would be convenient to be able to define type aliases or modules for use within a module, without adding them to the module itself (and thus requiring a signature to remove them). We would like to add support for declarations such as:

private type tmp = t list
private module M = F(X)

that will not appear in the surrounding module.

Note that this isn’t intended as any kind of replacement for mli files – mli files should always be used. It is really intended for use in sub-modules where the benefit of a full signature might be minimal.

It will also be supported in signatures, allowing things like:

module type S = sig
  type t
  module Sub : sig
    private type outer = t
    type t
    val to_outer : t -> outer
  end
end

which avoids forcing modules of type S to define an outer type.

One could also imagine using this feature to fix “ocamlc -i” which, currently, can sometimes print incorrect interfaces. For example, running “ocamlc -i” on the following module:

type t = T
module Sub = struct
  type t = S
  let to_outer S = T
end

produces:

type t = T
module Sub : sig
  type t = S
  val to_outer : t -> t
end

which would be incorrect if put in an .mli.

Strengthening the module system

The most subtle parts of OCaml’s module system all revolve around the equalities between different items in modules. For example, a common beginner mistake is to write something like:

module Foo : S = struct
  type t = foo
  ...
end

when they really wanted:

module Foo : S with type t = foo = struct
  type t = foo
  ...
end

The most confusing type errors from the module system come from when a module has unexpectedly lost an equality. In the example above it is probably the programmer’s fault, but in many corner cases OCaml will unhelpfully remove an equality when it really shouldn’t.

We’re planning to include a number of features that involve OCaml keeping more equalities on module types. A module type with more equalities is sometimes called a “stronger” module types – hence the above title.

Unfortunately, strengthening module types is not a backwards compatible change. For example, changing the type of a functor from:

module F (X : sig type t end) : ...

module F (X : sig type t = int end) : ...

would clearly break some uses of F. We feel that the short-term pain of any breakages caused by these changes is worth it for the long-term gain of removing a number of confusing corner cases from the language.

Transparent ascription

Last year, as part of his internship at Jane Street, Maciej Debski implemented transparent ascription. Transparent ascription is an operation:

module M = (N <: S)

that restricts M to the elements of the module type S, but it is still known to be equal to N. For example, M.t would be known to be equal to N.t and Map.Make(M).t would be known to be equal to Map.Make(N).t. This is in contrast to ordinary ascription:

module M = (N : S)

where M.t is not equal to N.t.

This feature is pretty useful on its own, but its mainly needed as a prerequisite for the other features in this list.

Aliases to functor parameters

Currently you cannot create a module alias for a functor parameter. For example:

module F (X : S) = struct module M = X end

currently has a type like:

module F (X : S) : sig module M : sig type t = X.t ... end end

The addition of transparent module ascription will make it possible to have module aliases for functor parameters. So the above example would have type:

module F (X : S) : sig module M = (X <: S) end

Fix “with module”

Currently, the semantics of with module N = M are not to add an alias N = M, but to give N the strengthened module type of M. For example:

module M = struct
  type t
  module Inner : sig
    type t
  end
end

module type S = sig
  module N : sig
    module Inner : sig
      type t
    end
  end
end

module type T = S with module N = M

will give

module type T = sig
  module N : sig
    type t = M.t
    module Inner : sig
      type t = M.Inner.t
    end
  end
end

We would like to strengthen this behaviour to give the alias instead:

module type T = sig module N = M end

In addition we would support the syntax

module type T = S with module N : R

to extend the signature of sub-module N, which will allow the old behavior of with module to be obtained using:

module type T = S with module N : module type of M

We would also allow the syntax:

module type T = S with module N = (M <: R)

to extend the signature with a transparent ascription. Note that this last syntax, in combination with the support for aliasing functor parameters should finally allow with module to correctly specify the result of a functor application.

Fix “module type of”

Currently module type of gives the unstrengthened module type. For example,

module M = struct type t end

module type S = module type of M

produces

module type S = sig type t end

This causes all kinds of problems. We will change it to give the strengthened module type instead, and support a [@weak] attribute to go back to the existing semantics. So the above example would give:

module type S = sig type t = M.t end

This change somewhat depends on the other strengthening changes to be tractable because it makes the requirements for a signature like:

module N : module type of M

stricter, and without the other changes many more [@weak] attributes are needed to get things compiling again.

Flambda

Flambda is the name of a series of optimisation passes provided by the native code compilers, which are not enabled by default.

Much of our recent efforts around flambda have been focused around a major rewrite of some of its core components, imaginatively dubbed flambda 2.0. However, we do have some changes planned for the existing version – mostly because they’re orthogonal to the flambda 2.0 changes and should be easy to port between the two versions.

Work towards making flambda classic mode the default compilation mode

There were a number of improvements made to the compile-time performance of flambda’s -Oclassic mode in OCaml 4.07.0. We’re going to be benchmarking the performance, both run-time and compile-time, of classic mode over the next few months. If the comparison with the current default (non-flambda) compilation mode is good then we would like to make classic mode the default upstream. If it still needs more work then we’ll be trying to get that done for 4.08 so that the default might be changed in 4.09.

Improved inlining heuristics for recursion

As part of his internship at Jane Street, Luke Maurer did some work to give a proper semantics to how flambda handles recursion. In particular, it will give command-line arguments like -inline-max-depth, and attributes like [@unroll 7] a well defined meaning.

Improved DWARF and GDB support

Mark Shinwell’s patches to dramatically improve DWARF output and GDB support have been waiting on review for a long time. We aim to get this review done and the patches merged upstream for this release.

Move the parser to Menhir

Menhir is a parser generator for OCaml that is vastly superior to ocamlyacc. In particular, it has much better support for producing useful error messages.

A pull-request (#292) was created years ago by Gabriel Scherer and Frédéric Bour, to switch OCaml’s parser to Menhir. This work got resurrected recently and we are trying to lend a hand with finishing and testing it.

The days of Error: Syntax error. may be numbered.

Add unsigned integer operations

There are open pull requests for both adding unsigned integer types and adding unsigned operations to the existing integer types. There is no consensus at this point around adding actual unsigned integer types, but there seems to be general agreement for at least providing the unsigned operations. We’ve had some demand for this internally, so we intend to help push this work along so it can be merged for 4.08.