OCaml is an amazing programming language to write industrial strength libraries and systems. At Jane Street we use it for literally all of our production systems, including for FPGA design, web development, and even machine learning.
However, for certain tasks we have found a different workflow to be highly effective: using Python with its lightweight syntax and huge ecosystem of libraries (numerical analysis, plotting, machine learning etc) inside a Jupyter notebook. This workflow is very convenient for iterating quickly, especially if the code is only meant to be run once. This often happens in quantitative research where one wants for example to quickly load a time series from a csv file, plot it, and compute some variance or correlation metrics using NumPy.
However it is crucial for us to be able to reuse our existing OCaml services and systems in this workflow. So we created a way to expose the services of our OCaml systems to Python users. Importantly, we want this to work in a way that the OCaml developers of those systems can create the Python bindings without requiring a deep understanding of Python itself. In our solution we provide transparent access for Python users to these systems by building on pyml which provides OCaml bindings to the Python C API.
In this blog post, we discuss how OCaml libraries can be called from Python as well as the other way around. We leverage pyml to write bindings that wrap functions from one language so that they can be used in the other. We introduce a ppx extension and a library to make writing such bindings easier. And finally we show how all this can be used to allow both Python and OCaml code to run in the same notebook and seamlessly exchange values between the two worlds. In the screenshot below an OCaml function is used to evaluate a Reverse Polish Notation expression and this function is called from Python. You can try this in your web browser either using Google Colab, or with binder.
Calling Python from OCaml
First, let us look at how to call python code from OCaml. This is not our primary use case but it nicely demonstrates the pieces involved.
The pyml library provides OCaml bindings to the Python C API. Using these bindings, the OCaml code can start the Python runtime and interact with it by building Python values or modules, calling methods, etc. Below is a simple example using Python to concatenate some strings.
let () =
(* Initialize the Python runtime. *)
Py.initialize ();
(* Create a Python object for the string "-". *)
let sep = Py.String.of_string "-" in
let foobar =
(* Call the .join method on the sep object with a single
argument that is a list of two strings. *)
Py.Object.call_method
sep
"join"
[| Py.List.of_list_map Py.String.of_string [ "foo"; "bar" ] |]
(* Convert the result back to an OCaml string. *)
|> Py.String.to_string
in
Printf.printf "%s\n" foobar
The type for Python values is called pyobject
. An OCaml string can be converted to such
an object via Py.String.of_string
. The resulting pyobject
can be converted back to
an OCaml string via Py.String.to_string
.
If the argument given to this last function happens not to be a Python string, an exception is raised at runtime. E.g. the following code compiles correctly but raises an exception when run:
let () =
ignore (Py.String.to_string (Py.Int.of_int 42) : string)
The exception is raised in the Python runtime, caught by pyml, and converted to an OCaml exception. It is pretty useful in this context as it provides some details about what went wrong.
Failure "Type mismatch: String or Unicode expected. Got: Long (42)".
It is even possible to run some Python code using Py.Run.eval
. This
evaluates a string containing a Python expression and returns the result as a pyobject
.
For example the following bit of OCaml code properly returns the integer 285.
Py.Run.eval "sum([n*n for n in range(10)])" |> Py.Int.to_int
Various other examples can be found in the readme of the pyml GitHub repo.
A PPX extension: ppx_python
The next problem is converting values between Python and OCaml. Converting simple values such as ints or strings is easy, however handling more complex types this way would be very cumbersome, so we wrote a PPX syntax extension to help automate this: ppx_python.
Annotating an OCaml type t
with [@@deriving python]
results in two
functions being automatically generated:
python_of_t: t -> pyobject
converts an OCaml value of typet
into a Python object value.t_of_python: pyobject -> t
converts a Python object value into a value of typet
.
The conversion is straightforward for basic types such as int
, float
, bool
, or string
.
unit
is converted to None
.
OCaml tuples are converted into Python tuples. OCaml lists and arrays
are converted to Python lists.
For OCaml options, None
is used on the Python side to represent the None
variant.
Otherwise the value is directly available. Note that this makes the two OCaml values
[Some None]
and [None]
indistinguishable on the Python side as both are represented
using None
.
Records are represented using Python dictionaries whose keys are strings.
The [@python.default]
attribute can be used on some of the fields to
make them optional on the Python side: when not present the default
value gets used.
For example, one can write the following OCaml code:
type t =
{ foo : int [@python.default 42]
; bar : float
} [@@deriving python]
The Python dictionary { 'bar': 3.14 }
would then be converted
to the OCaml record { foo = 42; bar = 3.14 }
and vice versa.
Calling OCaml from Python
With these conversion functions in place, we wrote a small library pythonlib on top of pyml. The goal is to make writing python bindings to OCaml services (using OCaml!) as simple as possible.
The library has been heavily inspired by Core’s command-line processing module,
see this section
of the Real World OCaml book for more details on Command
.
The parameters are specified using an Applicative and we can use the let-syntax
extension let%map_open
from ppx_let
to simplify the syntax.
In this example the OCaml code defines a Python function that takes as single
positional argument n
of integer type, the OCaml code then performs some
computations based on n
and returns the resulting float value.
We attach the function to a newly defined Python module named ocaml
.
open Base
let approx_pi =
let%map_open.Python_lib n = positional "n" int ~docstring:"the value n"
in
let sum =
List.init n ~f:(fun i -> let i = Float.of_int (1 + i) in 1.0 /. (i *. i))
|> List.reduce_exn ~f:(+.)
in
Float.sqrt (sum *. 6.) |> python_of_float
let () =
if not (Py.is_initialized ())
then Py.initialize ();
let mod_ = Py_module.create "ocaml" in
Py_module.set mod_ "approx_pi" approx_pi
~docstring:"computes a slowly convergent approximation of pi"
This code is compiled to a shared library ocaml.so
, together with a small C
library defining the PyInit_ocaml
function that starts the ocaml runtime and
exposes this module.
When using Python, it is then possible to import the ocaml
module and use the approx_pi
function as long as the ocaml.so
file can be found in the Python path.
import ocaml
print(ocaml.approx_pi(1000))
There are several advantages in using pythonlib:
- The type of arguments are automatically checked and they get converted between the appropriate Python and OCaml types.
- Calling the Python function with an incorrect number of arguments, or with improper argument names for keyword arguments results in easy to understand runtime errors.
- Documentation gets automatically generated and attached to the Python function, including the name of the parameters, their types, and some user specified contents. This documentation is available when using completion in Jupyter.
pythonlib handles basic types such as int, float, string, list, etc, and is easy to extend to
more complex types, e.g. by using ppx_python
. Further examples can be found in the
examples directory
of the GitHub repo.
Running Python and OCaml in the same notebook
Finally to take it one step further, it is even possible to mix and match OCaml and Python freely in the same notebook.
Jupyter supports a surprisingly large amount of programming languages via different kernels. A couple years back, a blog post “I Python, You R, We Julia” showed how to allow for cross-language integration in Jupyter rather than just relying on a single language per notebook.
In order to allow for evaluating OCaml expressions in a Python environment we wrote some
bindings for the OCaml toploop
module which is used by the OCaml Read-Eval-Print loop.
We expose two main functions in these
bindings:
eval
takes as input a string, parses it to a list of phrases and evaluates these phrases usingToploop.execute_phrase
. OCaml exceptions are caught and converted to a PythonSyntaxError
.get
takes as input one or two strings. When given two strings the second one represents some OCaml code, and the first one represents its expected type. This again parses and evaluates the string containing the OCaml code and converts the generated value using the provided type representation. If only one string is provided, the type representation is inferred by the OCaml compiler.
In the following example code, the call to toploop.eval
results in some output being
printed. The call to toploop.get
returns a Python function that takes as input a list
of pairs and returns some formatted version of this list.
from ocaml import toploop
toploop.eval('Printf.printf "hello from ocaml\n%!";;')
ocaml_fn = toploop.get(
'(int * string) list -> string list',
'List.map (fun (d, s) -> Printf.sprintf "%d: %s" (d+1) (String.uppercase_ascii s))'
print(ocaml_fn([(3141592, 'first-line'), (2718281, 'second-line')]))
In order to make calling the OCaml bits easier when using Jupyter, we added some Jupyter custom magics so that one can write OCaml cells, e.g.:
%%ocaml
let rec fact x = if x = 0 then 1 else x * fact (x-1);;
Printf.printf "fact(10) = %d" (fact 10);;
And also easily inline OCaml code in Python cells, e.g.
# Returns a Python function wrapping the recursive factorial implementation defined above.
f = %ocaml fact
f(10) # Apply f to 10 in Python.
The docstring associated with the OCaml functions contains the OCaml type for this function. This appears when using Jupyter’s completion. Closures can also be passed to and returned by functions, e.g. the OCaml map function can be made available to Python via the following snippet, a Python closure or function can be passed as the f argument.
map_fn = %ocaml fun (x, f) -> List.map f x
print(map_fn([1, 2, 3], lambda x: x*x))
We’ve created a small pip package for the ocaml python module. You
can install this using pip install ocaml
. Once this is done the
ocaml module can be imported from Python. You can even run this on
Google Colab by using this
notebook
or in
binder.
Among several examples the notebook includes a function computing the
number of ways to place N queens on a checkerboard. Note that this
package is not currently very well polished but it should give some
ideas of what can be done through this Python-OCaml integration.
We also compared the OCaml and Python implementation on this N-queens problem. This is far
from a realistic benchmark but still the OCaml version ends up being a lot faster
than the Python one. Note that with the toploop
module, the OCaml code is evaluated
by compiling to bytecode which is not optimal, switching to the opttoploop
module
that generates native code should make it even faster.
Next Steps
We have been successfully using both the Python-OCaml and OCaml-Python integration internally at Jane Street for a couple months now. Making it easy for Python users to access OCaml services has been a big win for us.
Most of our bindings wrap OCaml functions that rely on Async for various I/O operations. We currently block on such calls. However we plan on interfacing this with the newly introduced Python async/await syntax so that OCaml and Python tasks can be run concurrently.
A possible future use case would be to provide Python bindings for some of our core OCaml libraries, for example to parse and handle s-expressions with sexplib. Another interesting library could be Core Time_ns which is used to represent timestamps (as the timestamp representation in Python is hard to wrap your head around).
Finally when mixing Python and OCaml in the same notebook, custom OCaml types are
not handled when using the %ocaml
magic. It is possible to go around this using
some toploop.add_named_type
function but this is currently a bit brittle so
we certainly plan on improving this.