At Jane Street, we have always been heavy users of pre-processors, first with camlp4 and now ppx. Pre-processing makes the infrastructure a bit more complex, but it save us a lot of time by taking care of a lot of tedious boilerplate code and in some case makes the code a bit prettier.
All in all, our standard set has 19 rewriters:
These rewriters fall into 3 big categories:
- type driven code generators: ppx_sexp_conv, ppx_bin_prot, …
- inline tests and benchmarks: ppx_inline_test, ppx_expect, ppx_bench
- convenience: ppx_sexp_value, ppx_custom_printf, …
The first category is the one that definitely justify the use of pre-processors, until we get something better in the language itself.
With such a high number of code transformations, there is an important question of how they compose with each other. For instance what happens if the output of a ppx generates some code that is rewritten by another ppx?
Since the switch from camlp4 to ppx a year ago really, category 1 transformers were handled all at once as a whole-AST mapping pass by ppx_type_conv while all the other one were implemented as separate passes. With the previous list that means 13 passes, given that 7 of them are ppx_type_conv plugins. This means that the output depended on the order in which the various passes were applied.
Intuitively, one would think it’s not a big deal, given that it is quite rare for a ppx to produce code that would be rewritten by another ppx. Still we ran into several issues over time:
- Some ppx rewriters – such as ppx_inline_test that rewrites
[%%test ...]extensions – captures a pretty-print of their payload, for debugging purposes. Depending on when ppx_inline_test is applied, the payload won’t be the same, as it might have been expanded by other ppx rewriters, which is confusing for users.
- A few ppx rewriters interpret the payload of a specific extension point as a
DSL to be interpreted. This is the case of ppx_sexp_value and
ppx_sexp_message. If another ppx messed with the payload before them, the
result will be unspecified. We had such an issue with ppx_here: inside
[%here]is interpreted by ppx_sexp_value and ppx_sexp_message and produces
"<filename>:<line>:<column>", while outside it is interpreted by ppx_here and produces a record of type
Initially we dealt with these issues by using a specific order in the default set of rewriters, but that’s more of a dirty hack than a real solution. Often developers are not aware of this and might end up using a wrong order when using a custom set of rewriters. Moreover this worked because we have control over the order with Jenga, but in opensource packages using oasis, ocamlbuild and ocamlfind we have no control over the final ordering.
But apart from the semantic problems, there is an obvious performance problem: all the transformations are local, but still we are doing 12 passes over the entire AST. What a waste of CPU time!
The different ways of composing ppx rewriters
Before jumping into the subject of this post, we recall a few of the various methods one can use to compose ppx rewriters.
Via separate process
The default method, that was adopted early by the community is to define each transformation as a separate executable. To compose them, one just has to call all the executables one by one. The main advantage of this approach is that each transformation is a black box and can do whatever dirty hacks it wants.
This is what you get when you are using a ppx by just putting the package name in your build system without doing anything special.
Via a driver
Another approach, that we developed at Jane Street is to link all the
transformations into a single executable. For this to work properly all
transformations must use the same framework. Technically they all register
themselves with ppx_driver via a call to
Ppx_driver is then responsible for composing them.
There are several advantages of the second approach: since ppx_driver has knowledge of all transformations, it can do extended checks such as making sure that all attributes have been interpreted. This helps detect typos, which in practice saves a lot of debugging time. But what really interest us in this post is that it can use more clever composition methods.
Code transformations using ppx_driver can still export a single executable compatible with the first method, that’s why all Jane Street ppx rewriters can be used with both methods.
ppx_driver has an ocamlbuild plugin to simplify building custom drivers.
Context free transformations
Given that all transformations are local, it was clear that they should be
defined as such; i.e. if all you want to do is turn
[%sexp "blah"] into
Sexp.Atom "blah", you don’t need to visit the whole AST yourself. You just
need to instruct whatever framework you are using that you want to rewrite
[%sexp ...] extension points.
Context-free extension expander
We started with this idea a few month ago by adding an API in ppx_core to
declare context-free extension expanders. For instance, this shows how you would
declare a ppx that interpret an extension
[%foo ...] inside expressions:
open Ppx_core.Std let ext = Extension.declare "foo" Expression Ast_pattern.(...) (fun ~path ~loc <parsed-payload...> -> <expansion>) let () = Ppx_driver.register "foo" ~extensions:[ext]
Ast_pattern.(...) bit describes what the extension expects as its payload.
Since ppx_driver knows about all the local extension expanders, it can expand them all in one pass over the AST. Moreover it can detect ambiguities and error out in such cases.
There was a choice to make as to whether rewrite the AST in a bottom-up or top-down manner. We choose top-down, to allow extension expanders to interpret their payload before anyone else, and so they can correctly implement a DSL.
This solved most of the initial issues and reduced the number of passes to 7:
- all extension expanders
ppx_expect wasn’t initially defined as a context-free extension expander for technical reasons.
Making everything context-free
Recently we went even further and added a
Context_free module to
cover all of our transformations. It doesn’t support all possible rewriting but
support enough to implement a lot of common ones:
- context-free extension expanders
- some specific support to implement type-driven code generators
- support for ppx rewriters interpreting a function application at
pre-processing time, such as ppx_custom_printf that interprets
With this we reduced the number of passes to only 2:
- context free transformations
ppx_js_style is still done in a separate pass for simplicity. It is run last to ensure we don’t generate code that doesn’t match our coding rules.
Now, whatever order developers specify their ppx in their build system, they will get the exact same output.
Seeing the exact passes
Ppx_driver got a new option to print what passes it will execute, for instance with ppx-jane which a standard driver containing all of the Jane Street ppx rewriters linked in (available in the ppx_jane package in opam):
$ ppx-jane -print-passes <builtin:freshen-and-collect-attributes> <bultin:context-free> <builtin:check-unused-attributes> <builtin:check-unused-extensions> $ ppx-jane -print-passes -no-check <bultin:context-free>
Safety checks are implemented as additional passes, that’s why we see more than one passes by default.
No performance comparison was done when introducing context free extension
expanders, but we did some for the second stage, when we changed all ppx
rewriters to use
Context_free; processing a file with the resulting driver was
twice as fast (check passes included).
But how does this compare to the more traditional method of running each rewriter in a separate process? To find out we did some benchmark by taking one of the biggest ml file in core_kernel (src/command.ml) and comparing the two methods. We put a type error on the first line to be sure we stop just after pre-processing.
For reference, following are the numbers for calling
ocamlfind ocamlc on the
file with no pre-processing:
$ time ocamlfind ocamlc -c command.ml File "command.ml", line 1, characters 12-15: Error: This expression has type char but an expression was expected of type int real 0m0.022s user 0m0.016s sys 0m0.006s
To preprocess the file with ppx_jane as a single driver executable, one just has to pass one -ppx option, or a -pp option given that ppx_driver can be used either as a -ppx either as a -pp:
# via -ppx $ time ocamlfind ocamlc \ -ppx 'ppx-jane -as-ppx -inline-test-lib core -inline-test-drop -bench-drop' \ -c command.ml 2> /dev/null real 0m0.095s user 0m0.074s sys 0m0.020s # via -pp $ time ocamlfind ocamlc \ -pp 'ppx-jane -dump-ast -inline-test-lib core -inline-test-drop -bench-drop' \ -c command.ml 2> /dev/null real 0m0.091s user 0m0.066s sys 0m0.024s # via -pp, with checks disabled $ time ocamlfind ocamlc \ -pp 'ppx-jane -dump-ast -no-check -inline-test-lib core -inline-test-drop -bench-drop' \ -c command.ml 2> /dev/null real 0m0.070s user 0m0.051s sys 0m0.018s # via -pp, without merging passes $ time ocamlfind ocamlc \ -pp 'ppx-jane -dump-ast -no-merge -inline-test-lib core -inline-test-drop -bench-drop' \ -c command.ml 2> /dev/null real 0m0.229s user 0m0.206s sys 0m0.022s
Using the other method turned out to be quite painful, given that the various ppx cannot share command line arguments, they had to be specified more than once:
$ time ocamlfind ocamlc -package ppx_jane \ -ppxopt "ppx_inline_test,-inline-test-lib blah -inline-test-drop" \ -ppxopt "ppx_bench,-inline-test-lib blah -bench-drop" \ -ppxopt "ppx_expect,-inline-test-lib blah" \ -c command.ml 2> /dev/null real 0m0.339s user 0m0.233s sys 0m0.098s
So without surprise the single pass in a single executable method is really a lot faster.
This code is available on github. The context-free extension point API is already available in opam. The newer one is only in the git repository for ppx_core and ppx_driver. You can try them out by using our development opam repository. You should have a look at this if you care about how your rewriters are composed and/or if you care about compilation speed.
- ppx_js_style is not currently released; it is an internal ppx that we use to enforce our coding standards.