The cover image is based on Jupiter family by NASA/JPL.

OCaml is an amazing programming language to write industrial strength libraries and systems. At Jane Street we use it for literally all of our production systems, including for FPGA design, web development, and even machine learning.

However, for certain tasks we have found a different workflow to be highly effective: using Python with its lightweight syntax and huge ecosystem of libraries (numerical analysis, plotting, machine learning etc) inside a Jupyter notebook. This workflow is very convenient for iterating quickly, especially if the code is only meant to be run once. This often happens in quantitative research where one wants for example to quickly load a time series from a csv file, plot it, and compute some variance or correlation metrics using NumPy.

However it is crucial for us to be able to reuse our existing OCaml services and systems in this workflow. So we created a way to expose the services of our OCaml systems to Python users. Importantly, we want this to work in a way that the OCaml developers of those systems can create the Python bindings without requiring a deep understanding of Python itself. In our solution we provide transparent access for Python users to these systems by building on pyml which provides OCaml bindings to the Python C API.

In this blog post, we discuss how OCaml libraries can be called from Python as well as the other way around. We leverage pyml to write bindings that wrap functions from one language so that they can be used in the other. We introduce a ppx extension and a library to make writing such bindings easier. And finally we show how all this can be used to allow both Python and OCaml code to run in the same notebook and seamlessly exchange values between the two worlds. In the screenshot below an OCaml function is used to evaluate a Reverse Polish Notation expression and this function is called from Python. You can try this in your web browser either using Google Colab, or with binder.

RPN in Jupyter

Calling Python from OCaml

First, let us look at how to call python code from OCaml. This is not our primary use case but it nicely demonstrates the pieces involved.

The pyml library provides OCaml bindings to the Python C API. Using these bindings, the OCaml code can start the Python runtime and interact with it by building Python values or modules, calling methods, etc. Below is a simple example using Python to concatenate some strings.

let () =
  (* Initialize the Python runtime. *)
  Py.initialize ();
  (* Create a Python object for the string "-". *)
  let sep = Py.String.of_string "-" in
  let foobar =
    (* Call the .join method on the sep object with a single
       argument that is a list of two strings. *)
    Py.Object.call_method
      sep
      "join"
      [| Py.List.of_list_map Py.String.of_string [ "foo"; "bar" ] |]
    (* Convert the result back to an OCaml string. *)
    |> Py.String.to_string
  in
  Printf.printf "%s\n" foobar

The type for Python values is called pyobject. An OCaml string can be converted to such an object via Py.String.of_string. The resulting pyobject can be converted back to an OCaml string via Py.String.to_string.

If the argument given to this last function happens not to be a Python string, an exception is raised at runtime. E.g. the following code compiles correctly but raises an exception when run:

let () =
  ignore (Py.String.to_string (Py.Int.of_int 42) : string)

The exception is raised in the Python runtime, caught by pyml, and converted to an OCaml exception. It is pretty useful in this context as it provides some details about what went wrong.

Failure "Type mismatch: String or Unicode expected. Got: Long (42)".

It is even possible to run some Python code using Py.Run.eval. This evaluates a string containing a Python expression and returns the result as a pyobject. For example the following bit of OCaml code properly returns the integer 285.

Py.Run.eval "sum([n*n for n in range(10)])" |> Py.Int.to_int

Various other examples can be found in the readme of the pyml GitHub repo.

A PPX extension: ppx_python

The next problem is converting values between Python and OCaml. Converting simple values such as ints or strings is easy, however handling more complex types this way would be very cumbersome, so we wrote a PPX syntax extension to help automate this: ppx_python.

Annotating an OCaml type t with [@@deriving python] results in two functions being automatically generated:

  • python_of_t: t -> pyobject converts an OCaml value of type t into a Python object value.
  • t_of_python: pyobject -> t converts a Python object value into a value of type t.

The conversion is straightforward for basic types such as int, float, bool, or string. unit is converted to None. OCaml tuples are converted into Python tuples. OCaml lists and arrays are converted to Python lists.

For OCaml options, None is used on the Python side to represent the None variant. Otherwise the value is directly available. Note that this makes the two OCaml values [Some None] and [None] indistinguishable on the Python side as both are represented using None.

Records are represented using Python dictionaries whose keys are strings. The [@python.default] attribute can be used on some of the fields to make them optional on the Python side: when not present the default value gets used.

For example, one can write the following OCaml code:

type t =
  { foo : int [@python.default 42]
  ; bar : float
  } [@@deriving python]

The Python dictionary { 'bar': 3.14 } would then be converted to the OCaml record { foo = 42; bar = 3.14 } and vice versa.

Calling OCaml from Python

With these conversion functions in place, we wrote a small library pythonlib on top of pyml. The goal is to make writing python bindings to OCaml services (using OCaml!) as simple as possible.

The library has been heavily inspired by Core’s command-line processing module, see this section of the Real World OCaml book for more details on Command. The parameters are specified using an Applicative and we can use the let-syntax extension let%map_open from ppx_let to simplify the syntax.

In this example the OCaml code defines a Python function that takes as single positional argument n of integer type, the OCaml code then performs some computations based on n and returns the resulting float value. We attach the function to a newly defined Python module named ocaml.

open Base

let approx_pi =
  let%map_open.Python_lib n = positional "n" int ~docstring:"the value n"
  in
  let sum =
    List.init n ~f:(fun i -> let i = Float.of_int (1 + i) in 1.0 /. (i *. i))
    |> List.reduce_exn ~f:(+.)
  in
  Float.sqrt (sum *. 6.) |> python_of_float

let () =
  if not (Py.is_initialized ())
  then Py.initialize ();
  let mod_ = Py_module.create "ocaml" in
  Py_module.set mod_ "approx_pi" approx_pi
    ~docstring:"computes a slowly convergent approximation of pi"

This code is compiled to a shared library ocaml.so, together with a small C library defining the PyInit_ocaml function that starts the ocaml runtime and exposes this module.

When using Python, it is then possible to import the ocaml module and use the approx_pi function as long as the ocaml.so file can be found in the Python path.

import ocaml
print(ocaml.approx_pi(1000))

There are several advantages in using pythonlib:

  • The type of arguments are automatically checked and they get converted between the appropriate Python and OCaml types.
  • Calling the Python function with an incorrect number of arguments, or with improper argument names for keyword arguments results in easy to understand runtime errors.
  • Documentation gets automatically generated and attached to the Python function, including the name of the parameters, their types, and some user specified contents. This documentation is available when using completion in Jupyter.

pythonlib handles basic types such as int, float, string, list, etc, and is easy to extend to more complex types, e.g. by using ppx_python. Further examples can be found in the examples directory of the GitHub repo.

Running Python and OCaml in the same notebook

Finally to take it one step further, it is even possible to mix and match OCaml and Python freely in the same notebook.

Jupyter supports a surprisingly large amount of programming languages via different kernels. A couple years back, a blog post “I Python, You R, We Julia” showed how to allow for cross-language integration in Jupyter rather than just relying on a single language per notebook.

In order to allow for evaluating OCaml expressions in a Python environment we wrote some bindings for the OCaml toploop module which is used by the OCaml Read-Eval-Print loop. We expose two main functions in these bindings:

  • eval takes as input a string, parses it to a list of phrases and evaluates these phrases using Toploop.execute_phrase. OCaml exceptions are caught and converted to a Python SyntaxError.
  • get takes as input one or two strings. When given two strings the second one represents some OCaml code, and the first one represents its expected type. This again parses and evaluates the string containing the OCaml code and converts the generated value using the provided type representation. If only one string is provided, the type representation is inferred by the OCaml compiler.

In the following example code, the call to toploop.eval results in some output being printed. The call to toploop.get returns a Python function that takes as input a list of pairs and returns some formatted version of this list.

from ocaml import toploop

toploop.eval('Printf.printf "hello from ocaml\n%!";;')
ocaml_fn = toploop.get(
    '(int * string) list -> string list',
    'List.map (fun (d, s) -> Printf.sprintf "%d: %s" (d+1) (String.uppercase_ascii s))'

print(ocaml_fn([(3141592, 'first-line'), (2718281, 'second-line')]))

In order to make calling the OCaml bits easier when using Jupyter, we added some Jupyter custom magics so that one can write OCaml cells, e.g.:

%%ocaml
let rec fact x = if x = 0 then 1 else x * fact (x-1);;

Printf.printf "fact(10) = %d" (fact 10);;

And also easily inline OCaml code in Python cells, e.g.

# Returns a Python function wrapping the recursive factorial implementation defined above.
f = %ocaml fact
f(10) # Apply f to 10 in Python.

The docstring associated with the OCaml functions contains the OCaml type for this function. This appears when using Jupyter’s completion. Closures can also be passed to and returned by functions, e.g. the OCaml map function can be made available to Python via the following snippet, a Python closure or function can be passed as the f argument.

map_fn = %ocaml fun (x, f) -> List.map f x
print(map_fn([1, 2, 3], lambda x: x*x))

We’ve created a small pip package for the ocaml python module. You can install this using pip install ocaml. Once this is done the ocaml module can be imported from Python. You can even run this on Google Colab by using this notebook or in binder. Among several examples the notebook includes a function computing the number of ways to place N queens on a checkerboard. Note that this package is not currently very well polished but it should give some ideas of what can be done through this Python-OCaml integration.

N-queens in Jupyter

We also compared the OCaml and Python implementation on this N-queens problem. This is far from a realistic benchmark but still the OCaml version ends up being a lot faster than the Python one. Note that with the toploop module, the OCaml code is evaluated by compiling to bytecode which is not optimal, switching to the opttoploop module that generates native code should make it even faster.

N-queens benchmark in Jupyter

Next Steps

We have been successfully using both the Python-OCaml and OCaml-Python integration internally at Jane Street for a couple months now. Making it easy for Python users to access OCaml services has been a big win for us.

Most of our bindings wrap OCaml functions that rely on Async for various I/O operations. We currently block on such calls. However we plan on interfacing this with the newly introduced Python async/await syntax so that OCaml and Python tasks can be run concurrently.

A possible future use case would be to provide Python bindings for some of our core OCaml libraries, for example to parse and handle s-expressions with sexplib. Another interesting library could be Core Time_ns which is used to represent timestamps (as the timestamp representation in Python is hard to wrap your head around).

Finally when mixing Python and OCaml in the same notebook, custom OCaml types are not handled when using the %ocaml magic. It is possible to go around this using some toploop.add_named_type function but this is currently a bit brittle so we certainly plan on improving this.