(OCaml 4.02 is entering a feature freeze, which makes it a good time to stop and take a look at what to expect for this release. This is part of a series of posts where I’ll describe the features that strike me as notable. This is part 2.)
OCaml has a bit of a namespace problem.
In particular OCaml has no good way of organizing modules into packages. One sign of the problem is that you can’t build an executable that has two modules with the same module name. This is a pretty awkward restriction, and it gets unworkable pretty fast as your codebase gets bigger
Other than just prefixing all of your module names with a package name (e.g.,
Core_kernel_list
, Core_kernel_int
, Core_kernel_array
, etc. It gets old
fast.), the only solution right now is something called packed modules. OCaml
can pack a collection of individual module into a single synthetic “packed”
module. Importantly, different packs included in the same executable are allowed
to contain modules of the same name.
In practice, a packed moule is a lot like what you’d get it you named all of
your modules distinctly, and then used a single module to packs together all
your other modules, giving them shorter and more usable names in the process.
Thus, for Core_kernel
, we could name all our modules uniquely, and then
provide a single renaming module to allow people to use those modules
conveniently, like this:
module List = Core_kernel_list
module Array = Core_kernel_array
module Int = Core_kernel_int
...
And then user code could use these short names by opening the module:
open Core_kernel
let drop_zeros l = List.filter l ~f:(fun x -> x <> 0)
In the above, List
refers to Core_kernel
’s list, not the List
module that
ships with the compiler. The longer names would only show up within the
Core_kernel
package.
Packed modules basically automate this process for you, with the one improvement that you get to use the short names within the package your building as well as outside of it.
We use packed modules extensively at Jane Street, and they’ve been a real help in organizing our large and complex codebase. But packs turn out to be highly problematic. In particular, they lead to three distinct problems.
- slow compilation of individual files
- large executable sizes
- coarse dependency tracking, leading to slow incremental rebuilds.
The slow compilation of individual files comes from the cost of interacting with
a large module like Core_kernel
. Core_kernel
is large because it effectively
contains a full copy of every module in the Core_kernel
package. That’s
because a line like this:
module List = Core_kernel_list
doesn’t simply make Core_kernel.List
an alias to Core_kernel_list
; it makes
a full copy of the module. Indeed, the above line is equivalent to the
following.
module List = struct include Core_kernel_list end
Packed modules also increase your executable size, since OCaml includes code at
the compilation unit granularity. Because packed modules are compilation
units, referring to even a single module of Core_kernel
requires you to link
all of Core_kernel
into your executable.
The coarse dependency problem has to do with the fact that a packed module depends on all the modules that are included in it, and so once you depend on anything in the pack, you depend on everything there. For us, that means that changing a single line of the most obscure module in Core_kernel will cause us to have to rebuild essentially our entire tree.
Module aliases, along with a few related improvements to the compiler, let us work around all of these problems. In particular, in 4.02, the following statement
module List = Core_kernel_list
is in fact an alias rather than a copy. This means that opening Core_kernel
would only introduce a bunch of aliases, which does not require a lot of work
from the compiler.
Executable size will be improved because we’ll be able to move to having a package be structured as a module containing a set of aliases, rather than as a pack. That means we no longer have a single large compilation unit for the entire package, and so, using some improved dependency handling in the compiler, we can link in only the modules that we actually use.
Finally, the dependency-choke-point problem will be fixed by having a tighter
understanding of dependencies. In particular, the fact that I depend on
Core_kernel
, which contains a collection of aliases to many other modules like
Core_kernel_list
or Core_kernel_array
, doesn’t mean I truly depend on all
those modules. In particular, if I don’t use (and so don’t link in)
Core_kernel_array
, then I don’t need to recompile when `Core_kernel_array
changes.
Module aliases have other uses, in particular having to do with changes to the semantics of functors. But for us, the change to compilation speed and executable size are the big story.