Expect tests are a technique I’ve written about before, but until recently, it’s been a little on the theoretical side. That’s because it’s been hard to take these ideas out for a spin due to lack of tooling outside of Jane Street’s walls.
That’s changed now, since Dune has gotten good support for using expect tests. Given that, I thought this would be a nice time to demonstrate how expect-tests can be useful in some ways you might not expect; in particular, as a way of doing exploratory programming.
Preliminaries
The basic idea of an expect test is simple: expect tests let you
generate output that is then captured and included in the source
file. To try this out, let’s first create a jbuild
file for our
little experiment.
(jbuild_version 1)
(library
((name foo)
(libraries (base stdio))
(inline_tests)
(preprocess (pps (ppx_jane)))
))
Note that you’ll have to opam
install base
, stdio
and ppx_jane
for any of this to work. The inclusion of the (inline_tests)
declaration is important here, as is the preprocessor line.
Now, we can write a simple .ml
file that uses the expect test
framework to generate some output.
open! Base
open! Stdio
let%expect_test "simple" =
print_endline "Hello Expect World!"
We can then run this test and automatically capture the results by
running dune (which is still, confusingly, called jbuilder
at the
command line.)
jbuilder runtest --auto-promote
You’ll now see the file change to the following as soon as the build is complete. (This is all more fun if your editor is set to auto-refresh.)
open! Base
open! Stdio
let%expect_test "simple" =
print_endline "Hello Expect World!";
[%expect {| Hello Expect World! |}]
Smashing some HTML
Now let’s get to the exploratory programming part.
We’ll demonstrate a classic exploratory programming task: munging an HTML file to get some useful data. In particular, let’s say we want to find internal links on the opensource.janestreet.com site. We’re going to use lambdasoup, which is a great library for transforming HTML files.
After installing lambdasoup via opam, we need to update our jbuild
file accordingly. We should also install and include support for a
library called expect_test_helpers_kernel
, which provides some
useful tools for building expect tests.
(jbuild_version 1)
(library
((name foo)
(libraries (base stdio expect_test_helpers_kernel lambdasoup))
(inline_tests)
(preprocess (pps (ppx_jane)))
))
Now, we can write a little function for extracting links from an HTML file, using lambdasoup.
open! Base
open! Stdio
open! Expect_test_helpers_kernel
let get_hrefs soup =
Soup.select "a" soup
|> Soup.to_list
|> List.map ~f:(Soup.R.attribute "href")
We can test this out by writing an expect test against a little example.
let%expect_test "soup" =
let example = {|
<html><body>
<a href="http://janestreet.com"> A link! </a>
</body></html> |}
in
let hrefs = get_hrefs (Soup.parse example) in
print_s [%sexp (hrefs : string list)]
Note that we use print_s
from expect_test_helpers_kernel
to format
the s-expression, and the %sexp
syntax extension to generate the
s-expression to print. Again, if we run jbuilder again, the output
will be inserted into the file for us.
let%expect_test "soup" =
let example = {|
<html><body>
<a href="http://janestreet.com"> A link! </a>
</body></html> |}
in
let hrefs = get_hrefs (Soup.parse example) in
print_s [%sexp (hrefs : string list)];
[%expect {| (http://janestreet.com) |}]
At this point, it might occur to us to wonder what would happen if we
had an <a>
element with no href. Well, we can just try that out.
let%expect_test "soup" =
let example = {|
<html><body>
<a href="http://janestreet.com"> A link! </a>
<a> A broken link! </a>
</body></html> |}
in
let hrefs = get_hrefs (Soup.parse example) in
print_s [%sexp (hrefs : string list)];
[%expect {| (http://janestreet.com) |}]
Rerunning the test demonstrates that our code throws an exception in this case.
let%expect_test "soup" =
let example = {|
<html><body>
<a href="http://janestreet.com"> A link! </a>
<a> A broken link! </a>
</body></html> |}
in
let hrefs = get_hrefs (Soup.parse example) in
print_s [%sexp (hrefs : string list)];
[%expect {| DID NOT REACH THIS PROGRAM POINT |}];
[%expect {|
(* expect_test_collector: This test expectation appears to contain a backtrace.
This is strongly discouraged as backtraces are fragile.
Please change this test to not include a backtrace. *)
("A top-level expression in [let%expect] raised -- consider using [show_raise]"
(Failure "Soup.R.attribute: None")
(backtrace (
"Raised at file \"pervasives.ml\", line 32, characters 17-33"
"Called from file \"src/list.ml\", line 326, characters 13-17"
"Called from file \"test.ml\", line 17, characters 14-44"
"Called from file \"src/expect_test_helpers_kernel.ml\", line 475, characters 6-11"))) |}]
We can fix this easily enough by changing the selector we use to only
look for <a>
nodes with an href, as follows.
let get_hrefs soup =
Soup.select "a[href]" soup
|> Soup.to_list
|> List.map ~f:(Soup.R.attribute "href")
And now, rerunning jbuilder will show that we get reasonable output once again.
let%expect_test "soup" =
let example = {|
<html><body>
<a href="http://janestreet.com"> A link! </a>
<a> A broken link! </a>
</body></html> |}
in
let hrefs = get_hrefs (Soup.parse example) in
print_s [%sexp (hrefs : string list)];
[%expect {| (http://janestreet.com) |}]
Adding some real data
What if we want to apply this to some real data? Let’s grab the
current contents of opensource.janestreet.com
from the web and save
it to a file called opensource.html
. If we want our test to be able
to read from this file, we need to add it as an explicit dependency,
so we’ll adjust the jbuild file accordingly.
(jbuild_version 1)
(library
((name foo)
(libraries (base stdio expect_test_helpers_kernel lambdasoup))
(inline_tests ((deps (opensource.html))))
(preprocess (pps (ppx_jane)))
))
Now, we can add a new test, to see what our function does on
opensource.html
.
let%expect_test "opensource" =
let soup = In_channel.read_all "opensource.html" |> Soup.parse in
let hrefs = get_hrefs soup in
print_s [%sexp (hrefs : string list)]
Again, if we run the test, the file will be updated to include the output.
let%expect_test "opensource" =
let soup = In_channel.read_all "opensource.html" |> Soup.parse in
let hrefs = get_hrefs soup in
print_s [%sexp (hrefs : string list)];
[%expect {|
(https://www.janestreet.com/ad-cookie-policy
https://opensource.janestreet.com/
https://github.com/janestreet
https://ocaml.janestreet.com/ocaml-core/latest/doc/index.html
https://github.com/janestreet
https://github.com/ocaml/dune
https://opensource.janestreet.com/base
https://opensource.janestreet.com/core
https://opensource.janestreet.com/async
https://opensource.janestreet.com/incremental
https://www.janestreet.com/technology/
https://blog.janestreet.com/
https://opensource.janestreet.com/contribute
https://janestreet.com/) |}]
Now, we only wanted to extract the links that were actually on opensource.janestreet.com, and we got a bunch of other irrelevant links. To fix this, we need to analyze the URIs, so we’ll install the uri package from opam and add it to our jbuild, at which point we can change the code as follows.
let%expect_test "opensource" =
let soup = In_channel.read_all "opensource.html" |> Soup.parse in
let internal_links =
get_hrefs soup
|> List.filter ~f:(fun uri ->
let uri = Uri.of_string uri in
match Uri.host uri with
| None -> false
| Some host -> String.(=) host "opensource.janestreet.com")
in
print_s [%sexp (internal_links : string list)];
[%expect {|
(https://opensource.janestreet.com/
https://opensource.janestreet.com/base
https://opensource.janestreet.com/core
https://opensource.janestreet.com/async
https://opensource.janestreet.com/incremental
https://opensource.janestreet.com/contribute) |}]
Which gives us what we were looking for.
What’s nice about this approach is that we’ve been able to do this all in a way that’s both lightweight and repeatable. We can take the code we’ve written, commit it to the repo we’re working on, and anyone else can try to extend our examples. What’s more, once the logic we want is finished, it might make sense to leave in these little experiments as regression tests, which will help make sure that we don’t break things as we start refactoring and reorganizing the code later.