We recently released a version of our open source libraries with a much anticipated change – Async_kernel, the heart of the Async concurrent programming library, now depends only on Core_kernel rather than on Core.
This sounds like a dull, technical change, and it kind of is. But it’s also part of a larger project to make our libraries more lightweight and portable, and so suitable for a wider array of users and applications.
We’ve actually been working on these issues for a while now, and this seems like a good time to review some of the changes we’ve made over the years, and what’s still to come.
Reorganizing for portability
Core has always had dependencies on Unix, including OCaml’s Unix library, as well as some other parts of the Unix environment, like the Unix timezone files. This has long been a problem for porting to Windows, but more recently, the issue has loomed for two other increasingly important platforms for OCaml: Javascript and Mirage.
To help fix this problem, in 2013 we released a library called Core_kernel, which is the portable subset of Core that avoids Unixisms as well as things like threads that don’t match well with the Javascript and Mirage back-ends.
In the same vein, we refactored Async, our concurrent programming library, into a set of layers (modeled on the design of the similar Lwt library) that both clarified the design and separated out the platform specific bits. Async_kernel is the lowest level and most portable piece, hosting the basic datastructures and abstractions. Async_unix adds a Unix-specific scheduler, and Async_extra builds further os-specific functionality on top.
Until recently, the fly in this ointment was that Async_kernel still depended on Core, rather than Core_kernel, because only Core had a time library. Making Async_kernel only require Core_kernel was a bigger project than you might imagine, in the end leading us to change Timing_wheel, a core datastructure for Async and several other critical libraries at Jane Street, to use an integer representation of time instead of the float-based one from Core.
Already, some experiments are underway to take advantage of this change, including some internal efforts to get Async working under javascript, and external efforts to get cohttp’s Async back-end to only depend on Async_kernel.
I’m hoping that yet more of this kind of work will follow.
Module Aliases
One long-running annoyance with OCaml is the lack of an effective namespace mechanism. For a long time, the only choice was OCaml’s packed modules, which let you take a collection of modules and merge them together into one mega-module. Some kind of namespace mechanism is essential at scale, and so we used packed modules throughout our libraries.
Unfortunately, packed modules have serious downsides, both in terms of compilation time and executable sizes. We’ve been talking to people about this and looking for a solution for a long time. You can check out this epic thread on the platform list if you want to see some of the ensuing conversation.
A solution to this problem finally landed in OCaml 4.02, in the form of module aliases. I’ll skip the detailed explanation (you can look here if you want to learn more), but the end result was great: our compilation times immediately went down by more than a factor of 3, and it gave us a path towards dropping packed modules altogether, thus reducing executable sizes and making incremental compilation massively more efficient.
The work on dropping packed modules has already landed internally, and will hopefully make it to the external release in a few months. The benefit to executable size is significant, with typical executables dropping in size by a factor of 2, but there is more to do. OCaml doesn’t have aggressive dead code elimination, and that can lead to a lot of unnecessary stuff getting linked in. We’re looking at some improvements we can make to cut down the dependency tree, but better dead code elimination at the compiler would really help.
Sharing basic types
Interoperability between Core and other OCaml libraries is generally pretty good: Core uses the same basic types (e.g., string, list, array, option) as other OCaml code, and that makes it pretty easy to mix and match libraries.
That said, there are some pain points. For example, Core uses a Result type
(essentially, type ('a,'b) result = Ok of 'a | Error of 'b
) quite routinely,
and lots of other libraries use very similar types. Unfortunately, these
libraries each have their own incompatible definitions.
The solution is to break out a simple type that the different libraries can share. After some discussion with the people behind some of the other libraries in question, I made a pull request to the compiler to add a result type to the stdlib.
This is a small thing, but small things matter. I hope that by paying attention to this kind of small issue, we can help keep interoperability between Core and the rest of the OCaml ecosystem smooth.
Eliminating camlp4
One concern I’ve heard raised about Core and Jane Street’s other libraries is their reliance on camlp4. camlp4 is a somewhat divisive piece of infrastructure: it’s long been the only decent way to do metaprogramming in OCaml, and as such has been enormously valuable; but it’s also a complex and somewhat unloved piece of infrastructure that lots of people want to avoid.
camlp4 also makes tooling a lot more complicated, since there’s no single syntax to target. Dev tools like ocp-indent and the excellent merlin have some terrible hacks to support some of the most common camlp4 syntax extensions, but the situation is clearly untenable.
You do need camlp4 to build Core, but you don’t need camlp4 to use it, and in practice, that’s good enough for most use cases. But for people who want to avoid camlp4 entirely, it’s still a nuisance. Moreover, while you don’t need camlp4 to use Core, it is convenient. For example, a lot of Core’s idioms work best when you provide s-expression serializers for your types, and the sexplib syntax extension is an awfully convenient way to generate those functions.
Our plan is to simply eliminate our dependency on camlp4 entirely over the next 6 months, by switching to using ppx and extension points, a new approach to metaprogramming in OCaml that, like module aliases, landed in 4.02. We’re currently rewriting all of our syntax extensions, and building tools to automatically migrate the code that depends on camlp4. People who want to continue to use the old camlp4 extensions are welcome to continue doing so, but we’re cutting our dependency on them.
Even at the end of all this, we don’t expect that Core and Async will suit everyone – that’s a hard bar to cross for any software package. But we do hope that through these efforts, an ever wider set of developers will be able to take advantage of the work we’ve done.