It’s no secret that Jane Street is an active participant in the programming language community, and we’re excited to be attending ICFP 2024, the International Conference on Functional Programming, in Milan next week! Most members of our OCaml Language team will be there, and as usual, we look forward to sharing our work with the wider community. Please see below for a full list of papers and talks that Jane Street folk are involved in. Note that a lot of these are collaborations of one kind or another with researchers outside of Jane Street.

We’re doing work on many different areas in OCaml: type-system features, improvements to code generation and register allocation, better inlining, etc.

But a big focus of our work in the last couple of years has been around extensions to the type-system to give users more control over performance-relevant aspects of their program. This includes Rust-like control over patterns of memory management for avoiding heap allocation and garbage collection and for enabling data-race free parallel programming. This has led to us investing in extending OCaml’s type systems to support modal types, supporting modes like local and unique, as well as a kind system for allowing us to specify unboxed memory layouts.

Upstreaming our changes

We’re doing all of this on our own branch of the OCaml compiler. But we really don’t want these features to remain just with us forever. We’re modeling this on the work that was done by OCaml Labs on Multicore OCaml. That project lived on its own fork for a period of years, but eventually was upstreamed, after a lot of consideration and review.

In the long run, we hope to do the same with our extensions. That’s going to take some time, of course, as well as some convincing of the broader OCaml world that our changes are worth upstreaming. Some of that work will happen by publishing papers, like the ones mentioned above.

But we don’t think papers are enough here: it’s also important to give people a chance to kick the tires on these new language features. That’s harder than it sounds, despite the fact that our compiler is already open source. That’s because a lot of these features only really take flight when you have an ecosystem of libraries that support them. We have those libraries internally, but for compatibility reasons we end up erasing their use of our language extensions when we release them publicly.

To fix that, we’ve put together a “bleeding-edge” opam repo that uses both our compiler and the un-bowdlerized version of our libraries, so you can experience these type-system features the same way we do.

We’ll also have laptops set up to use our branch at the conference, and would love to show you these features in person and let you try them out.

Our extensions at work

Below, I’ll highlight a few examples of how we’re using these new features in real code.

A New Bonsai API

Bonsai is the frontend web framework we use to build the vast majority of web apps at Jane Street. It features a functional user-facing API, similar in spirit to the the Elm architecture, but with a different approach to managing state, and with more powerful tools for optimizing the incremental performance of the view calculation.

Fundamental to this model is Bonsai’s “two-phase” approach:

  1. A graph-building phase, where a DAG of computations is constructed.
  2. A runtime phase, where data flows through the computation graph, driving the dynamic behavior of the application.

The static nature of the graph is critical for providing a sane model of per-component state, and also makes it possible to share the computation of subgraphs across multiple components. The phase distinction also enables certain kinds of optimizations to be performed before runtime.

However, enforcing that applications have the correct structure isn’t trivial, and the “old” Bonsai API often proved a hurdle for newcomers. Though it was type-safe, its model of Computation.ts composed of other Computation.ts and Value.ts was complex, and it proved all too easy to accidentally construct a _ Computation.t Computation.t or a _ Computation.t Value.t, both of which were always a bug, which often presented itself far away from where the mistake was made.

The solution? During the graph-building phase, Bonsai provides users with a witness value at mode local, which they are then required to provide to various functions. Here’s a simplified example of the underlying pattern.

type phase1_value
type phase1_witness
type phase2_value

val only_callable_in_phase1 : phase1_witness @ local -> phase1_value

val run_phase1 : (phase1_witness @ local -> 'a) -> 'a
val run_phase2 : phase1_value -> phase2_value

This leverages the compiler’s escape analysis for local values, with the phase1_witness serving as proof we are in the correct phase.

For a deeper dive into the theory behind Bonsai, consider checking out Leo White’s Arrows as applicatives in a monad HOPE talk.

Unboxed Types and Mixed Blocks

A major push of the OCaml Language team this year has been our work on unboxed types. The basic idea of unboxed types is to allow for new types with a different layout in memory than traditional OCaml data. These layouts are part of a broader kind system we’ve added to the language.

We use the kind “value” to describe the types of ordinary OCaml values. So, we might write an ordinary polymorphic array like this:

module Array : sig
  type ('a : value) t
end = struct
  type ('a : value) t = 'a array
end

But there are a number of other kinds as well, such as immediate, the kind of immediate (non-pointer) values, or word, the kind of unboxed nativeint#s. We can also express unboxed structures, like #(int64# * float#), an unboxed pair of unboxed numbers, which would have kind bits64 & float64.

But this all gets tricky when you get to defining structures, like this one:

type mixed = { symbol : string; price : float#; size : int64# }

Here, the type mixed has a mix of layouts within it, and this clashes with the current way OCaml represents memory. Effectively, OCaml objects must either contain entirely things that match the traditional OCaml memory layout, or, objects can be opaque, where they can have arbitrary data, but won’t be scanned by the GC and so can’t contain any pointers.

In order to support types like mixed, we needed to add a mixed block to OCaml’s memory representation. The design choices here are pretty tricky, and you can hear more about the details in Nick Roberts’ talk about Mixed Blocks.

Polymorphism, or Lack Thereof

It should be no surprise that Streeters love OCaml’s rich and lightweight type system. Its pervasive support for polymorphism makes it much easier to write certain constructs than the equivalent in other languages, while still providing compile-time safety. However, as we add kinds and modes, it’s hard to make those available with the right level of polymorphism. This is a thing we’re actively working on, but it’s going to take time to get right, and in the meantime, we have less polymorphism at the kind and mode level than we really want.

Without such polymorphism, developers are often forced to write the same thing twice, or come up with tortured solutions using existing tools. For example, consider the identity function:

let id : 'a. 'a -> 'a = fun x -> x

This, unfortunately, only works for values at mode global. If we wanted to implement it for, say, values and float64s, at either mode global or local, we’d need four distinct functions:

let id_value_global : ('a : value). 'a @ global -> 'a @ global = fun x -> x
let id_value_local : ('a : value). 'a @ local -> 'a @ local = fun x -> x
let id_float64_global : ('a : float64). 'a @ global -> 'a @ global = fun x -> x
let id_float64_local : ('a : float64). 'a @ local -> 'a @ local = fun x -> x

As somewhat of a stopgap, we’ve implemented ppx_template, which essentially adds C++-style “templates” (😱) to the language. Users can write some snippet of code once, specify which kinds and/or modes for which to generate bindings, and the ppx stamps out a version of said code (modulo name mangling) for each.

With the ppx, the above instead becomes:

let id : ('a : k). 'a @ m -> 'a @ m = fun x -> x
[@@kind k = (value, float64)] [@@mode m = (global, local)]

Beyond significantly improving its readability, we expect that this brings it closer to whatever syntax we ultimately end up choosing for kind and mode polymorphism.

Other Odds and Ends

This might seem like a lot, but it’s only a fraction of the work we’re doing every day! One feature which has seen rapid adoption within Jane Street is Labeled Tuples, which Ryan Tjoa implemented as an intern project.

We are also nearing completion of a zero-overhead option type, called or_null. The basic idea is to use the C-like representation of a nullable value being either a direct pointer to the object in question, or a null-pointer. This works fine, except that to do it safely, you have to make sure to not nest one nullable object inside another. In this design, we use the kind system to prevent this kind of nesting.

Jane Street and you

We look forward to seeing everyone at ICFP, and are looking forward to sharing more about what we’re working on, and hearing about everyone else’s work as well!

If this kind of stuff sounds fun, consider applying!. We have lots of fun roles both directly on our OCaml Language team, and on language-oriented projects across the firm. Take a look here if you want to get a sense of the other kinds of PL-flavored projects we work on!