It’s no secret that Jane Street is an active participant in the programming language community, and we’re excited to be attending ICFP 2024, the International Conference on Functional Programming, in Milan next week! Most members of our OCaml Language team will be there, and as usual, we look forward to sharing our work with the wider community. Please see below for a full list of papers and talks that Jane Street folk are involved in. Note that a lot of these are collaborations of one kind or another with researchers outside of Jane Street.
- Oxidizing OCaml with Modal Memory Managament
- Arrows as applicatives in a monad
- A Non-allocating Option
- Labeled Tuples
- Mixed Blocks: Storing More Fields Flat
- Designing interrupts for ML and OCaml
- Pattern-matching on mutable values: danger!
- Rethinking the Value Restriction
- Flambda2 Validator
We’re doing work on many different areas in OCaml: type-system features, improvements to code generation and register allocation, better inlining, etc.
But a big focus of our work in the last couple of years has been around extensions to the
type-system to give users more control over performance-relevant aspects of their program.
This includes Rust-like control over patterns of memory management for avoiding heap
allocation and garbage collection and for enabling data-race free parallel
programming. This has led to
us investing in extending OCaml’s type systems to support modal types, supporting modes
like local
and unique
, as
well as a kind system for allowing us to specify unboxed memory
layouts.
Upstreaming our changes
We’re doing all of this on our own branch of the OCaml compiler. But we really don’t want these features to remain just with us forever. We’re modeling this on the work that was done by OCaml Labs on Multicore OCaml. That project lived on its own fork for a period of years, but eventually was upstreamed, after a lot of consideration and review.
In the long run, we hope to do the same with our extensions. That’s going to take some time, of course, as well as some convincing of the broader OCaml world that our changes are worth upstreaming. Some of that work will happen by publishing papers, like the ones mentioned above.
But we don’t think papers are enough here: it’s also important to give people a chance to kick the tires on these new language features. That’s harder than it sounds, despite the fact that our compiler is already open source. That’s because a lot of these features only really take flight when you have an ecosystem of libraries that support them. We have those libraries internally, but for compatibility reasons we end up erasing their use of our language extensions when we release them publicly.
To fix that, we’ve put together a “bleeding-edge” opam repo that uses both our compiler and the un-bowdlerized version of our libraries, so you can experience these type-system features the same way we do.
We’ll also have laptops set up to use our branch at the conference, and would love to show you these features in person and let you try them out.
Our extensions at work
Below, I’ll highlight a few examples of how we’re using these new features in real code.
A New Bonsai API
Bonsai is the frontend web framework we use to build the vast majority of web apps at Jane Street. It features a functional user-facing API, similar in spirit to the the Elm architecture, but with a different approach to managing state, and with more powerful tools for optimizing the incremental performance of the view calculation.
Fundamental to this model is Bonsai’s “two-phase” approach:
- A graph-building phase, where a DAG of computations is constructed.
- A runtime phase, where data flows through the computation graph, driving the dynamic behavior of the application.
The static nature of the graph is critical for providing a sane model of per-component state, and also makes it possible to share the computation of subgraphs across multiple components. The phase distinction also enables certain kinds of optimizations to be performed before runtime.
However, enforcing that applications have the correct structure isn’t trivial, and the
“old” Bonsai API often proved a hurdle for newcomers. Though it was type-safe, its model
of Computation.t
s composed of other Computation.t
s and Value.t
s was complex, and it
proved all too easy to accidentally construct a _ Computation.t Computation.t
or a _
Computation.t Value.t
, both of which were always a bug, which often presented itself far
away from where the mistake was made.
The solution? During the graph-building phase, Bonsai provides users with a witness value
at mode local
, which they are then required to provide to various functions. Here’s a
simplified example of the underlying pattern.
type phase1_value
type phase1_witness
type phase2_value
val only_callable_in_phase1 : phase1_witness @ local -> phase1_value
val run_phase1 : (phase1_witness @ local -> 'a) -> 'a
val run_phase2 : phase1_value -> phase2_value
This leverages the compiler’s escape analysis for local
values, with the
phase1_witness
serving as proof we are in the correct phase.
For a deeper dive into the theory behind Bonsai, consider checking out Leo White’s Arrows as applicatives in a monad HOPE talk.
Unboxed Types and Mixed Blocks
A major push of the OCaml Language team this year has been our work on unboxed types. The basic idea of unboxed types is to allow for new types with a different layout in memory than traditional OCaml data. These layouts are part of a broader kind system we’ve added to the language.
We use the kind “value
” to describe the types of ordinary OCaml values. So, we might
write an ordinary polymorphic array like this:
module Array : sig
type ('a : value) t
end = struct
type ('a : value) t = 'a array
end
But there are a number of other kinds as well, such as immediate
, the kind of immediate
(non-pointer) value
s, or word
, the kind of unboxed nativeint#
s. We can also express
unboxed structures, like #(int64# * float#)
, an unboxed pair of unboxed numbers, which
would have kind bits64 & float64
.
But this all gets tricky when you get to defining structures, like this one:
type mixed = { symbol : string; price : float#; size : int64# }
Here, the type mixed
has a mix of layouts within it, and this clashes with the current
way OCaml represents memory. Effectively, OCaml objects must either contain entirely things
that match the traditional OCaml memory layout, or, objects can be opaque, where they can
have arbitrary data, but won’t be scanned by the GC and so can’t contain any pointers.
In order to support types like mixed
, we needed to add a mixed block to OCaml’s memory
representation. The design choices here are pretty tricky, and you can hear more about
the details in Nick Roberts’ talk about Mixed
Blocks.
Polymorphism, or Lack Thereof
It should be no surprise that Streeters love OCaml’s rich and lightweight type system. Its pervasive support for polymorphism makes it much easier to write certain constructs than the equivalent in other languages, while still providing compile-time safety. However, as we add kinds and modes, it’s hard to make those available with the right level of polymorphism. This is a thing we’re actively working on, but it’s going to take time to get right, and in the meantime, we have less polymorphism at the kind and mode level than we really want.
Without such polymorphism, developers are often forced to write the same thing twice, or come up with tortured solutions using existing tools. For example, consider the identity function:
let id : 'a. 'a -> 'a = fun x -> x
This, unfortunately, only works for value
s at mode global
. If we wanted to implement
it for, say, value
s and float64
s, at either mode global
or local
, we’d need four
distinct functions:
let id_value_global : ('a : value). 'a @ global -> 'a @ global = fun x -> x
let id_value_local : ('a : value). 'a @ local -> 'a @ local = fun x -> x
let id_float64_global : ('a : float64). 'a @ global -> 'a @ global = fun x -> x
let id_float64_local : ('a : float64). 'a @ local -> 'a @ local = fun x -> x
As somewhat of a stopgap, we’ve implemented ppx_template, which essentially adds C++-style “templates” (😱) to the language. Users can write some snippet of code once, specify which kinds and/or modes for which to generate bindings, and the ppx stamps out a version of said code (modulo name mangling) for each.
With the ppx, the above instead becomes:
let id : ('a : k). 'a @ m -> 'a @ m = fun x -> x
[@@kind k = (value, float64)] [@@mode m = (global, local)]
Beyond significantly improving its readability, we expect that this brings it closer to whatever syntax we ultimately end up choosing for kind and mode polymorphism.
Other Odds and Ends
This might seem like a lot, but it’s only a fraction of the work we’re doing every day! One feature which has seen rapid adoption within Jane Street is Labeled Tuples, which Ryan Tjoa implemented as an intern project.
We are also nearing completion of a zero-overhead option type, called or_null
. The basic
idea is to use the C-like representation of a nullable value being either a direct pointer
to the object in question, or a null-pointer. This works fine, except that to do it
safely, you have to make sure to not nest one nullable object inside another. In this
design, we use the kind system to prevent this kind of nesting.
Jane Street and you
We look forward to seeing everyone at ICFP, and are looking forward to sharing more about what we’re working on, and hearing about everyone else’s work as well!
If this kind of stuff sounds fun, consider applying!. We have lots of fun roles both directly on our OCaml Language team, and on language-oriented projects across the firm. Take a look here if you want to get a sense of the other kinds of PL-flavored projects we work on!