OCaml with Jane Street extensions is available from our public opam repo. Only a slice of the features described in this series are currently implemented.
In part one, we discussed how OCaml’s locality mode enables safe stack allocation. In part two, we explored how the uniqueness and linearity modes represent ownership. In this (final) post, we leverage modes to define a statically data-race-free API in multicore OCaml.
Data Races
With the recent release of OCaml 5, shared-memory parallelism has become a first-class language feature. OCaml programs now span multiple domains, each of which corresponds to an operating-system thread. Higher-level code can schedule work onto domains using async-await semantics.
Let’s consider extending Async to support parallel tasks. Using a simplified interface, we’ll spawn two invocations of a function before waiting on their results:
val parallel : (unit -> 'a) -> 'a Deferred.t
val await : 'a Deferred.t -> 'a
let count () =
let i = ref 0 in
let incr () = for _ = 1 to 100_000 do i := !i + 1 done in
let p1, p2 = parallel incr, parallel incr in
let (), () = await p1, await p2 in
print_int !i
;;
Naively, combining parallelism with mutability leads to data races. When one domain writes to a memory location while others access it, readers will observe unspecified results. In languages like C++, we’re forced to conclude that data races are undefined behavior. On the other hand, OCaml’s memory model makes races less dangerous—despite producing unpredictable results, they never invalidate type safety, so cannot crash the program.
So, what happens if we run the above code?
> 118533
Data races still cause bugs! Correctly authoring a parallel-safe mutable data structure requires careful usage of atomics, mutexes, or other traditional synchronization primitives. Getting it wrong can cause nondeterministic failures that are difficult to debug.
Rust is comparatively stricter: data races are unrepresentable, as all mutations must go through exclusively owned references. However, Rust’s underlying ownership system results in complicated code, especially in the presence of concurrency.
Thanks to OCaml’s garbage collector, most code does not need to worry about ownership. This allows users to avoid the complexity of tracking ownership in the type system. In this post, we seek to make data races unrepresentable without losing this ability. First, we’ll use modes to prohibit sharing mutable data between domains, only allowing mutation via ownership-based exclusive references. Second, we’ll use capsules to safely encapsulate shared mutability without tracking fine-grained ownership constraints.
The Sync Mode
The sync
mode describes values that can be freely shared between domains.
Values are sharable if they do not allow concurrent writes, so:
- Immutable values are always
sync
, as no writes ever occur. - Exclusively mutable values are also
sync
, since all writes must go through exclusive references. - Other mutable values are
unsync
. We have no control over when they may be written to, so they can’t be safely shared between domains.
type x = { x : int }
type y = { exclusively mutable y : int }
type z = { mutable z : int }
let sync x = { x = 0 }
let sync y = { y = 0 }
let sync z = { z = 0 }
7 | let sync z = { z = 0 } ^^^^^^^^^ Error: found an unsync value where a sync value was expected.
Like uniqueness, a sync
parameter represents a promise by the caller.
-
A
sync
parameter requires the caller to provide sharable values. The callee is allowed to share the value with other domains. -
An
unsync
parameter does not encode a promise from the caller; the callee must not share the value with another domain.
A sync
variable may be used as if it were unsync
, so sync
is the sub-mode.
Hence, unsync
values may reference sync
values, but not vice versa.
For example, a closure capturing an unsync
value must also be unsync
.
Usage
Armed with our new mode axis, we can make parallel
require a sync
closure.
We’ll want to schedule tasks that capture unique variables, so the closure will also be at mode once
.
Lastly, since the function’s result will be passed back to our domain, it too must be sync
.
val parallel : (unit -> 'a @ sync) @ sync once -> 'a Deferred.t
This signature involves several modes, so let’s translate it to
a new version of the “@
” syntax. For better readability, we’ll write one
annotation that may expand function arrows.
val parallel : (unit -> 'a) -> 'a Deferred.t
@ sync once (. -> sync) -> .
Compiling our example program now results in an error:
let count () =
let i = ref 0 in
let incr () = for _ = 1 to 100_000 do i := !i + 1 done in
let () = await (parallel incr) in
print_int !i
;;
6 | let () = await (parallel incr) in ^^^^ Error: found an unsync value where a sync value was expected.
The closure incr
captures a mutable reference, so it can’t be passed to parallel
.
The sync
mode is sufficient to prevent data races.
Like Rust, shared values may only be mutated via exclusive references.
At runtime, we’ll find that our program partitions its mutable values per domain:
In this diagram, red circles are mutable; blue circles are exclusively-mutable or immutable.
Each domain contains a collection of unsync
values, each of which points toward some red circle.
All domains may reference sync
values, which only see blue circles.
Critically, there are no pointers into a domain: we cannot traverse from one domain’s unsync
values to another’s.
Exclusive Mutability
In our new model, we can only share exclusively mutable data structures. Naively, we could convert our example program to use an exclusively mutable counter—but we’ll find that exclusivity requires tracking the same precise ownership constraints we’d find in Rust. For example:
type counter = { exclusively mutable i : int }
let count () =
let c = { i = 0 } in
let incr () = for _ = 1 to 100_000 do &c.i <- &c.i + 1 done in
let () = await (parallel incr) in
print_int c.i
;;
6 | let () = await (parallel incr) in ^^^^ Error: this value escapes its region.
Here, incr
borrows the counter, making it a local closure—and parallel tasks must be global.
We can instead explicitly mark incr
as global, causing the counter to be moved into the closure.
let count () =
let c = { i = 0 } in
let global incr () = for _ = 1 to 100_000 do &c.i <- &c.i + 1 done in
let () = await (parallel incr) in
print_int c.i
;;
5 | print_int c.i ^ Error: c is used uniquely so cannot be used twice.
But now we can’t reference the counter outside incr
—its ownership has been transferred.
To fix this, we can return the counter from the parallel task.
let count () =
let c = { i = 0 } in
let global incr () = for _ = 1 to 100_000 do &c.i <- &c.i + 1 done; c in
let c = await (parallel incr) in
print_int c.i
;;
val count : unit -> unit
> 100000
Since parallel tasks cannot borrow, we’re forced to explicitly move the counter between domains—it’s never actually shared. You might have seen this coming: if multiple domains reference an exclusively-mutable value, it can’t be mutated.
Technically, we got what we asked for: we can write parallel code without fear of data races. However, if we can never share exclusive references, exclusive mutability isn’t useful in practice. Therefore, we will introduce mutexes.
Mutexes
Traditionally, mutable state can be safely shared between domains via careful use of locks. The simplest kind of lock is a mutex, which assures that only one domain can access a value at a time.
In OCaml, a mutex interface might look like the following:
module Mutex : sig
type 'a t
val create : 'a -> 'a t
@ unique sync -> .
val with_lock : 'a t -> ('a -> 'b) -> 'b
@ . -> local (local exclusive -> .) -> .
end
Calling with_lock
locks the mutex, borrows the wrapped value, runs the callback, and finally unlocks the mutex.
If the mutex was already locked, we know some other domain is accessing the value,
so we first wait for it to release the mutex.
Therefore, mutexes can be safely shared across domains.
We can illustrate a mutex as a green circle. Its single outgoing pointer may only be traversed by one domain at a time, so it provides an exclusive reference.
Importantly, mutexes are created from unique sync
values.
Stipulating sync
assures that no unsync
value—which may be visible to another domain—is ever made accessible via the mutex.
Uniqueness additionally guarantees that the mutex holds the only pointer to its contents.
type counter = { exclusively mutable i : int }
let count () =
let mutex = Mutex.create { i = 0 } in
let worker () =
for _ = 1 to 100_000 do
Mutex.with_lock mutex (fun counter -> counter.i <- counter.i + 1)
done
in
let p1, p2 = parallel worker, parallel worker in
let (), () = await p1, await p2 in
let result = Mutex.with_lock mutex (fun counter -> counter.i) in
print_int result
;;
> 200000
The addition of locks gives us capabilities roughly equivalent to safe Rust: we can trade off exclusive borrows
without explicitly moving the underlying value.
However, this API is not very flexible.
First, we must create a mutex for each shared value.
Second, we’re still restricted to sync
data—we can’t just wrap an existing mutable data structure in a mutex.
Shared Mutability
Restricting ourselves to exclusive mutability means we can’t use shared-ownership data structures. To see why, let’s attempt to implement a mutable circularly linked list.
type 'a node = { data : 'a
; exclusively mutable next : 'a node }
let create data =
let rec node = { data; next = node } in
node
;;
val create : 'a -> 'a node
Since create
does not return a unique
list, we’ll never be able to mutate the result—even within a domain, we can never exclusively reference a cyclic value.
This problem is endemic in Rust, where shared mutation requires unsafe code.
OCaml’s garbage collector enables a safe solution—mutable fields—but the resulting data structure isn’t sync
.
If we try to use a mutable value in parallel, we’ll get an error:
type 'a node = { data : 'a
; mutable next : 'a node }
let count () =
let rec head = { data = 0; next = head } in
let append () = head.next <- { data = 1; next = head } in
let () = await (parallel append) in
print_int head.next.data
;;
7 | let () = await (parallel append) in ^^^^^^ Error: found an unsync value where a sync value was expected.
Rust works around this issue by allowing shared-ownership data structures to present an exclusively-mutable API. This abstraction allows the type to be protected by a mutex: its underlying shared mutability is encapsulated in its unsafe implementation.
In OCaml, we need a way to encapsulate unrestricted mutability without using unsafe language features. For this purpose, we introduce capsules.
Capsules
In our initial design, each domain was allowed to reference a collection
of unsync
data.
For example, a mutable circularly linked list:
There are no pointers into our domain, so no other domains can access our list.
We can relax this restriction by creating collections of unsync
data that aren’t associated with a particular domain.
We call these collections capsules.
Capsules allow incoming green pointers, which may only be traversed by one domain at a time.
To enforce exclusivity, we’ll need a way to indicate which domain is currently accessing which capsule. This is the purpose of keys, which have a one-to-one correspondence with capsules.
module Capsule : sig
module Key : sig
type 'k t
type packed = Key : 'k t -> packed
end
val create : unit -> Key.packed
@ . -> unique
(* ... *)
end
When we create a capsule, we get back a Key.packed
, which hides the underlying type 'k
.
We know that this 'k
exists, but that’s all we know—so we call 'k
an existential type.
Concretely, creating a capsule mints a brand-new 'k
that is associated with only the returned key.
This mechanism uniquely identifies keys based on their creation site, allowing us to distinguish between them.
For example:
val require_same_key : 'k Capsule.Key.t -> 'k Capsule.Key.t -> unit
let distinct_keys () =
let Key key0 = Capsule.create () in
let Key key1 = Capsule.create () in
require_same_key key0 key1
;;
6 | require_same_key key0 key1 ^^^^ Error: this expression has type $Key_'k1 Capsule.Key.t but an expression was expected of type $Key_'k0 Capsule.Key.t
We can never pass two different keys to require_same_key
, because distinct keys necessarily have distinct types.
Pointers
When a domain has exclusive access to a key, it may traverse any
green pointer into the associated capsule.
We can model this behavior with the following signature, where a
capsule pointer of type ('a, 'k) t
references a value of type 'a
and
requires the 'k Key.t
to traverse.
module Capsule : sig
(* ... *)
module Ptr : sig
type ('a, 'k) t
val create : ('a -> 'b) -> 'a -> ('b, 'k) t
@ local once sync (sync -> .) -> sync -> .
val map : 'k Key.t -> ('a -> 'b) -> ('a, 'k) t -> ('b, 'k) t
@ local exclusive -> local once sync -> . -> .
val extract : 'k Key.t -> ('a -> 'b) -> ('a, 'k) t -> 'b
@ local exclusive -> local once sync (. -> sync) -> . -> sync
end
val destroy : 'k Key.t -> ('a, 'k) Ptr.t -> 'a
@ unique -> . -> .
end
The first three functions each require a local once sync
callback. The annotation indicates that
the callback will not be stored, will only be called once, and cannot close
over unsync
values.
That’s a lot to take in—let’s unpack each new operation.
-
create
initializes a capsule containing the result of a function. Because we’re making the function’s input available to another capsule, it must besync
. Its output, however, may beunsync
, allowing us to create mutable state.(* Create a mutable reference inside a capsule. *) let pointer = Capsule.Ptr.create (fun i -> ref i) 0
Creating a pointer does not require a key, since it does not provide access to any preexisting data.
-
map
executes a function inside an existing capsule, arbitrarily restructuring the data therein. It requires an exclusive reference to the key, implying that only one domain can runmap
at a time.For example, we can wrap our reference in another reference, creating a mutable chain:
let Key key = Capsule.create () let pointer = Capsule.Ptr.create (fun i -> ref i) 0 (* Wrap the reference in another reference. *) let pointer = Capsule.Ptr.map &key (fun r -> ref r) pointer
Critically,
map
also allows us to create additional pointers into the same capsule. In this example, map returns a pointer to our newly created state, which we may access using the same key. -
extract
also executes a function inside a capsule, but directly returns its result to the caller. The returned value must besync
, as it is now visible to the calling domain.let Key key = Capsule.create () let pointer = Capsule.Ptr.create (fun i -> ref i) 0 (* Return an immutable value read from the reference. *) let result = Capsule.Ptr.extract &key (fun i -> !i) pointer
Conveniently, we can match the type of each function with the mode of its callback. Creation inputs a
sync
value, mapping a value to a pointer; extraction outputs async
value, mapping a pointer to a value. Similarly,map
neither inputs nor outputs async
value, so maps pointers to pointers. -
destroy
merges a pointer into the caller’s capsule. We must provide the associated key uniquely, guaranteeing that no other domain can traverse this pointer again.let Key key = Capsule.create () let pointer = Capsule.Ptr.create (fun i -> ref i) 0 (* Merge the reference into the enclosing capsule. *) let i = Capsule.destroy key pointer
In this diagram,
destroy
recolors a green pointer to black by consuming its associated key.
Capsules provide two important capabilities that mutexes do not.
-
Aggregating ownership: a mutex always protects a unique value, that is, a single pointer. Capsules associate a semantic key with arbitrarily many pointers.
-
Protecting shared mutable state: mutexes only contain
sync
values. Capsules enable us to allocate shared mutable data structures without tracking their ownership at the type level.
Putting it All Together
We’ve built a way to trade off exclusive references (mutexes), and a way to associate a key with a dynamic data structure (capsules). Combining these approaches—storing keys in mutexes—lets us create a statically data-race-free version of any existing data structure!
For example, a hash table:
module Locked_hashtbl = struct
type t = Table :
{ table : ((int, string) Hashtbl.t, 'k) Capsule.Ptr.t
lock : 'k Capsule.Key.t Mutex.t } -> t
let create () =
let Key key = Capsule.create () in
let table = Capsule.Ptr.create Hashtbl.create () in
let lock = Mutex.create key in
Table { table; lock }
;;
let add_exn ( Table { table; lock } ) k v =
Mutex.with_lock lock (fun key ->
Capsule.Ptr.extract key (fun table ->
Hashtbl.add_exn table k v) table)
;;
let length ( Table { table; lock } ) =
Mutex.with_lock lock (fun key ->
Capsule.Ptr.extract key Hashtbl.length table)
;;
end
The implementation is almost as concise as wrapping a table with a mutex in C++ or Java, but far less error-prone. Mixing up locking is a compilation error—once it compiles, we can be sure that our code is free of data races.
let count () =
let table = Locked_hashtbl.create () in
let worker start () =
for i = start to start + 99_999 do
Locked_hashtbl.add_exn table i (Int.to_string i)
done
in
let p1, p2 = parallel (worker 0), parallel (worker 100_000) in
let (), () = await p1, await p2 in
let result = Locked_hashtbl.length table in
print_int result
;;
> 200000
Performance-minded readers might be concerned about the runtime overhead of this approach—but with two clever observations, we may compile it to a simple low-level lock.
-
Keys do not contain information; we either have one or we don’t. That means they take zero bits to store, and don’t have to exist at runtime at all.
-
The mutex is essentially a key-option, hence only takes on two values—locked and unlocked. At runtime, the mutex is a bool; acquire/release are atomic test-and-set instructions.
More practically, we could implement locking using futexes. Fortunately, separating the concepts of keys and locks lets users swap between lock implementations without rewriting their data structures.
Afterword
Given the initial success of OCaml 5, we’re optimistic about the future of writing high-performance parallel programs in OCaml. We believe data-race-free OCaml will enable users to write correct and efficient code, without complicating the single-core experience.
At Jane Street, we’ve been using locality in production for over a year, and
have recently merged compiler support for the unique
and once
modes.
Once they’ve been extensively tested in our internal environment, we hope to contribute these features to upstream OCaml.
Instructions for installing OCaml with Jane Street extensions using opam
are available on GitHub.
This post concludes Oxidizing OCaml, but it’s not the end for modes.
We’re planning to extend the design presented here with sync
’s dual mode axis,
allowing for more fine-grained restrictions on data access.
Similarly, we’re exploring modal representations for algebraic effects,
off-heap data, and compile-time computations—as well as mode-polymorphic functions.