(OCaml 4.02 is entering a feature freeze, which makes it a good time to stop and take a look at what to expect for this release. This is the first of a few blog pots where I’ll describe the features that strike me as notable.)
OCaml’s metaprogramming story is kind of messy.
The good news is that OCaml has an effective metaprogramming system. It’s called
camlp4
, and before complaining about it, I want to be clear how useful it is.
At Jane Street, syntax extensions like sexplib
, pa_compare
and binprot
have made us more productive, allowing us to extend the language in ways that
save us from having to write big piles of unmaintainable boilerplate.
But camlp4
has some serious warts, which mostly derive from where it sits in
the OCaml pipeline. In particular, camlp4
is an alternate front-end to the
compiler, with its own extensible parser that allows you to extend OCaml’s
syntax in any way you like. There are downsides, however, that derive from
having two different parsers. In particular, camlp4
’s parser has its own
slightly different behavior, its independent set of bugs, and its own excitingly
obscure error messages.
camlp4
’s separate parser might seem like a necessary evil. After all, syntax
extensions require changing the syntax, and you obviously can’t change the
syntax without a new parser.
Or can you? Lisp has a rich syntactic macro system but just one parser. Thus, in Lisp, all macros are AST-to-AST transformations. Lisp’s macro system can implement lots of different syntaxes within the world of s-expressions, since s-expressions are so general and flexible.
OCaml’s syntax, on the other hand, is very specific and inflexible. It lets you parse OCaml’s syntax exactly, and any deviation is flagged as a syntax error. Almost any interesting syntax extension will require parsing programs that are not syntactically valid OCaml programs.
That’s where extension points come in. Extension points are a collection of extensions to OCaml’s grammar that adds a notation for annotations. With these annotations, OCaml’s syntax becomes general enough to accommodate many different syntax extensions. Indeed, Alain Frisch, who is the main author and advocate of this change, organized a survey of existing camlp4-based syntax extensions, and made sure that extension points were rich enough to accommodate them.
The big advantage of this approach is that it simplifies the process of developing the compiler (because you don’t need to maintain two independent implementations of the parser) and because you only have one syntax for development tools to target. One of the wins we hope to get from this is that IDE-like tools like Merlin should be able to more easily interact with code that uses syntactic macros, like the codebase at Jane Street.
The downside, of course, is that to take advantage of this, you need to port your existing syntax extensions to work against the (now extended) OCaml AST. Also, it means that the concrete syntax that was used in existing syntax extensions will mostly need to change. No longer can we write
type t = int * string with sexp
Instead, we’ll need to write something like:
type t = int * string [@@sexp]
That said, we expect this change to be worth the implied churn.
p.s., it was noted that I didn’t do a great job of showing the flexibility that extension points gives you. If you want to learn more about this, this goes over the major use-cases for syntax extensions that were considered in the design of the annotation syntax, and how they would be rendered in OCaml 4.02.