Using OxCaml to implement type-safe reference counting between OCaml and Python

Jane Street is known for being an OCaml shop, but for years now Python has been our second major programming language, acting as the primary tool for data analysis and (especially importantly these days) machine learning. Most of our traders and researchers think and write in Python, even as the majority of our infrastructure is written in OCaml.

So it’s been important to support a bridge between these two languages. For that we developed PyOCaml, which lets authors expose a Python interface to their OCaml library. The trouble is that sometimes things fall off the bridge: in particular, when you represent a Python object as an OCaml value, the interactions between the different languages’ memory management systems can lead to object deallocations getting materially delayed. For simple, small-scale data types this is no big deal, but for programs working with huge data frames or scarce resources like GPU memory, it’s a real problem.

We ended up developing a solution that relied on some nifty features of OxCaml, a set of language extensions for OCaml intended to support high-performance programs and data-race free parallelism. These features have allowed us to encode prompt deallocation in a typesafe way. When PyOCaml library authors use these new features, the compiler can actually statically guarantee that Python programs written against them won’t have those promptness problems. This is a big win: in the old world, it was theoretically possible to write Python that avoided losing track of objects, but it required an impractical level of care and expertise. Now such offending programs are impossible to write by construction.

To understand how that works, it helps first to know how Python objects are allocated and GC’d; how they’re represented in OCaml; and how we can borrow the idea of “borrowing” to implement explicit and typesafe reference counting between the two.

A primer on Python objects and their lifecycle

In Python, every object is allocated in memory using a structure with some type-specific layout. The first fields in the structure are shared across all structures, and include information like the type of the object and a reference count field.

Unlike OCaml, where the garbage collector is scanning and moving, Python objects are reference counted. A freshly created object has reference count 1; when a new reference to the object is created (e.g., the object gets stored in a list), the reference count needs to be incremented; when an object goes out of scope, the count is decremented. Once the count reaches 0, the object is deallocated:

my_list = [] # Refcount of my_list is 1
my_dict = {}
my_dict["my_list"] = my_list # Refcount of my_list is 2
del my_list # Refcount of my_list is 1
del my_dict["my_list"] # Refcount of my_list went to 0, deallocated

Borrowing and stealing

In Python, when you pass an arg into a function, there are two ways to ensure that its reference count is managed correctly. One is called “borrowing.” When a function borrows a reference to the object, it doesn’t increment its reference count.

Code can “borrow” a reference to an object. As an example, when calling a function with some argument, the argument object can be borrowed from the caller during the function call, as long as the object doesn’t outlive the call:

def g():
    obj = object()        # We just made a new object.
                          # Exactly one name (`obj`) points at it. Refcount = 1.

    res = f(obj)          # We call f, passing the object in.

def f(arg):
    # arg is just another name for the same object `obj`.
    # Counter is still 1, NOT 2.
    # That's "borrowing": passing into a function
    # does not increase the count.

    res = [arg]
    # Now we put it inside a list.
    # The list is a new, persistent container that points at the object.
    # That DOES bump the count: 1 -> 2.

    return res

Note that the above code merely demonstrates the concept, but isn’t actually true, in the sense that the actual interpreter does things slightly differently (and even depends on the exact version). The above applies only to functions that are not implemented in pure Python (but are rather exposed to Python by some extension module written in, say, C).

Suppose g continued:

def g():
    obj = object()
    res = f(obj)          # res is the list. The list still points at obj.
                          # Counter on obj is 2: one from `obj`, one from the list.

    del obj               # Drop the name `obj`. Counter: 2 -> 1.
                          # The list still has it.

    del res               # Drop the list. The list is freed,
                          # which releases its reference to obj.
                          # Counter: 1 -> 0. obj is freed.

This is safe: we know inside f that the reference count of arg will not go to 0, because the caller g still holds a reference that’s valid at least until f returns.

Meanwhile, some APIs “steal” a reference from a caller: they take some argument that will outlive the function call, but instead of incrementing the reference count, they “steal” it and the caller no longer owns the reference it had on the object when invoking the function. (The reference count might even have gone to 0 before the callee returns!) Stealing is not something you can see in the Python code, but rather is implemented at the C-extension level.

In borrowing-style code:

PyObject *obj = make_something();   // counter = 1
list_append(my_list, obj);          // list increfs internally: counter = 2
Py_DECREF(obj);                     // I drop my reference: counter = 1

Stealing-style code:

PyObject *obj = make_something();   // counter = 1, I own it
list_append_steal(my_list, obj);    // I do NOT incref.
                                    // The list now considers itself
                                    // the owner of that single reference.
                                    // I'm no longer allowed to use obj.

When writing C code to implement a Python extension, a developer must manage the ownership of object references manually, since the compiler won’t catch bugs:

A missing Py_DECREF could lead to a memory leak
A missing Py_INCREF on a borrowed object could lead to an invalid pointer later on
A missing Py_INCREF could lead to an invalid pointer when an object passed to a stealing function is later reused
A missing Py_INCREF could lead to problems when a borrowed object is passed to a stealing function
Receiving a borrowed reference from a function could lead to an invalid pointer when using free-threaded Python

etc.

Representing Python objects as OCaml values can leave refcounts hanging

In PyOCaml, Python objects are represented as OCaml values of type Py.Object.t. These are “custom block” values that hold an 8-byte pointer regardless of the size of the underlying Python object. This Py.Object.t is allocated onto the global heap and will be live until garbage collection. We want to make sure that the Python object is GC’d on the Python side, which means ensuring that the OCaml side doesn’t hold any reference longer than necessary.

Trouble is, it’s easy for this to go awry. When we wrap a PyObject reference into an OCaml value, we increment its reference count; but it will only be decremented when the custom block’s finalizer is called. This only happens when the OCaml garbage collector frees up the Py.Object.t. In effect we’ve coupled the freeing of the underlying Python object (which could be a many-GB data frame) to the eventual GC’ing of the OCaml value. This means that when calling a PyOCaml function the reference count of an argument may remain incremented even after the function returns, and it’ll only be decremented when the OCaml garbage collector frees up the Py.Object.t.

Concretely, let’s say you have some OCaml code defining a PyOCaml function:

let pyocaml_function (arg : Py.Object.t) : Py.Object.t =
  let i = Py.Int.to_int arg in
  let res = i + 1 in
  Py.Int.of_int res

The way this is implemented in C glue code (heavily simplified for the purposes of this post) is as follows:

PyObject *call_pyocaml_function(PyObject *arg) {
    /* The C function borrows, but the OCaml value is globally allocated
       and could outlive this function. As an example, `pyocaml_function` could
       store it in some global map. */
    Py_INCREF(arg);
    value ocaml_arg = wrap_pyobject_into_py_object_t(arg);

    value ocaml_result = call_pyocaml_function(ocaml_arg);
    PyObject *result = unwrap_pyobject_from_py_object_t(ocaml_result);

    /* The returned ocaml_result is a `Py.Object.t` that's globally allocated,
       and could outlive the call to `pyocaml_function` */
    Py_INCREF(result);
    return result;
}

Because the Py_INCREF isn’t paired with a decref until the Py.Object.t gets GC’d by the OCaml runtime, the borrowing semantics don’t work out the way they did in pure Python. In particular:

arg = 1024 # Reference count is 1
res = pyocaml_function(arg)
# Reference count of arg is 2, reference count of res is 2,
# unlike the pure-Python examples above

del arg # Reference count of arg is 1
del res # Reference count of res is 1

# As long as the OCaml GC doesn't run (i.e., at least as long as we don't call any PyOCaml code),
# reference count of arg and res remains 1 and neither are deallocated.

It’s only once the OCaml GC observes that arg and the returned Py.Object.t were no longer referenced, and invoked the custom block finalizers, that the reference count of arg and res go to 0 and both get deallocated.

This problem is exacerbated by the fact that the OCaml garbage collector has no knowledge of the amount of memory these Py.Object.ts are keeping alive. From OCaml’s point of view it’s just an 8-byte object. So the garbage collector may not work as hard to clean them up.

Using OxCaml modes to do typesafe borrowing and stealing

We can improve this situation thanks to functionality introduced in OxCaml. Instead of always globally allocating Py.Object.t values with their owned reference and relying on the OCaml garbage collector to clean up these references, we can use two new annotations:

@ local means “this value won’t escape this function call.” If a Python API function takes its argument as @ local, OCaml can prove the caller still holds the object for the whole call, so we can skip the incref.
@ unique means “this is the only handle to the value.” If an OCaml function returns a Python object as @ unique, we know that this value holds a unique reference to the object, so we can likewise hand the handle straight back to Python without the incref.

We can employ these modes in a few common cases to ensure most of our code is deallocating promptly.

Borrowing arguments

Most Python API functions exposed by PyOcaml only borrow their arguments. As a result, we can mark them as @ local. So

val Py.Int.to_int : Py.Object.t -> int

becomes

val Py.Int.to_int : Py.Object.t @ local -> int

Similarly, we change how arguments are passed to PyOCaml function implementations: instead of giving a Py.Object.t, the argument becomes Py.Object.t @ local. Our C glue code can then safely construct a Py.Object.t value (in this case with no finalizer set!) without increasing the reference count: the C function itself borrows the argument object, and just passes it on to the OCaml function, borrowed. When the OCaml function returns, we’re guaranteed the Py.Object.t value we called it with is no longer referenced by any OCaml code / data. For safety sake, we set the pointer stored in the value to NULL.

Stealing results

When invoked, PyOCaml functions construct and return Py.Object.t values. We can change the APIs to construct Python callables from OCaml functions and require the OCaml functions to return Py.Object.t @ unique values. As a result, after our C glue code invokes the OCaml function, we know the result value is no longer referenced in OCaml code or data, and we can steal the PyObject * from that value and return it to the Python runtime as-is, without increasing the reference.

We do, however, need to set the pointer stored in the Py.Object.t value to NULL, and make sure the finalizer can handle this properly. If not, the finalizer would decrease the reference count of an object to which it no longer owns a reference (the object may even have been deallocated).

Retaining uniqueness through borrowing

There’s a problem once you start using @ unique returns. Suppose you build up a fresh Python list (@ unique) and you want to call .append on it before returning it. The code you’d like to write is:

let i = Py.Int.of_int 1 in
let lst = Py.List.create 0 in        (* unique *)
let _ = Py.Object.call_method lst "append" [: i :] in
lst : Py.Object.t @ unique           (* doesn't typecheck! *)

Even though Py.List.create returns a Py.Object.t @ unique, once we pass it to:

val Py.Object.call_method 
   : Py.Object.t @ local 
  -> string @ local 
  -> Py.Object.t iarray @ local read 
  -> Py.Object.t @ unique

the uniqueness of the value is consumed.

Luckily, borrowing can save us here. The borrow_ keyword means “just loan this to the function, don’t actually give it up.” As long as call_method takes its argument locally, we know the argument value can’t outlive the function call, and hence if the argument value was unique before the function call, it’ll still be unique once the function returns.

The OxCaml compiler got basic support for borrowing recently, so we can write the code above as:

let i = Py.Int.of_int 1 in
let lst = Py.List.create 0 in        
- let _ = Py.Object.call_method lst "append" [: i :] in
+ let _ = Py.Object.call_method (borrow_ lst) "append" [: i :] in
lst : Py.Object.t @ unique           (* typechecks! *)

Safe explicit reference counting when you really need it

The mode system covers the common cases, but sometimes you really do want to manage things manually, for instance when you want to store a Python object in a long-lived OCaml global, or eagerly free a giant data frame before doing something risky. These modes allow you to write code with explicit, manual reference counting management when this would make sense. We provide the following functions:

val Py.newref : Py.Object.t @ local -> Py.Object.t @ unique
val Py.decref : Py.Object.t @ local unique -> unit

`newref` allows you to construct a new global reference when given a local one. decref in turn consumes the uniqueness of a given value, so it can’t be used afterwards, and releases the reference.

Most code wouldn’t use decref explicitly, since it clutters code and isn’t required (the GC finalizer performs a decref when the Py.Object.t gets garbage-collected). But if profiling shows Python objects are retained longer than desired in some code, an explicit decref could solve this problem. An example would be code which constructs some large dataframe and then runs some other function which might raise, and wants to proactively release the dataframe in the error path.

Explicit decref calls get quite tedious when you have many objects to manage, especially in light of exceptions. We’ve written ppx_release to make all of this more ergonomic. For example, the following code is safe, and doesn’t keep the Python objects constructed by create_dataframe and create_series alive for longer than strictly necessary, even when do_some_calculation raises an exception:

let%release df = create_dataframe ()
and series = create_series () in
do_some_calculation df series

This approach allows for type-safe, compiler-checked, code-directed reference count handling. It’s worth noting that Rust gives you similar benefits, but with a key difference: unlike the Rust equivalent, we don’t put the burden of ownership tracking on all users of the APIs. In many cases, code can be written in a more traditional style, with the garbage collector safely taking care of deallocations when it’s safe to do so.

This connects to OxCaml’s larger design goal of providing control over performance-critical aspects of program behavior, but only where you need it. We think that the resulting improvement in ergonomics is worth quite a lot!