Repeatable exploratory programming

Expect tests are a technique I’ve written about before, but until recently, it’s been a little on the theoretical side. That’s because it’s been hard to take these ideas out for a spin due to lack of tooling outside of Jane Street’s walls.

That’s changed now, since Dune has gotten good support for using expect tests. Given that, I thought this would be a nice time to demonstrate how expect-tests can be useful in some ways you might not expect; in particular, as a way of doing exploratory programming.

Preliminaries

The basic idea of an expect test is simple: expect tests let you generate output that is then captured and included in the source file. To try this out, let’s first create a jbuild file for our little experiment.

(jbuild_version 1)

(library
 ((name foo)
  (libraries (base stdio))
  (inline_tests)
  (preprocess (pps (ppx_jane)))
))

Note that you’ll have to opam install base, stdio and ppx_jane for any of this to work. The inclusion of the (inline_tests) declaration is important here, as is the preprocessor line.

Now, we can write a simple .ml file that uses the expect test framework to generate some output.

open! Base
open! Stdio

let%expect_test "simple" =
  print_endline "Hello Expect World!"

We can then run this test and automatically capture the results by running dune (which is still, confusingly, called jbuilder at the command line.)

jbuilder runtest --auto-promote

You’ll now see the file change to the following as soon as the build is complete. (This is all more fun if your editor is set to auto-refresh.)

open! Base
open! Stdio

let%expect_test "simple" =
  print_endline "Hello Expect World!";
  [%expect {| Hello Expect World! |}]

Smashing some HTML

Now let’s get to the exploratory programming part.

We’ll demonstrate a classic exploratory programming task: munging an HTML file to get some useful data. In particular, let’s say we want to find internal links on the opensource.janestreet.com site. We’re going to use lambdasoup, which is a great library for transforming HTML files.

After installing lambdasoup via opam, we need to update our jbuild file accordingly. We should also install and include support for a library called expect_test_helpers_kernel, which provides some useful tools for building expect tests.

(jbuild_version 1)

(library
 ((name foo)
  (libraries (base stdio expect_test_helpers_kernel lambdasoup))
  (inline_tests)
  (preprocess (pps (ppx_jane)))
))

Now, we can write a little function for extracting links from an HTML file, using lambdasoup.

open! Base
open! Stdio
open! Expect_test_helpers_kernel

let get_hrefs soup =
  Soup.select "a" soup
  |> Soup.to_list
  |> List.map ~f:(Soup.R.attribute "href")

We can test this out by writing an expect test against a little example.

let%expect_test "soup" =
  let example = {|
     <html><body>
       <a href="http://janestreet.com"> A link! </a>
     </body></html> |}
  in
  let hrefs = get_hrefs (Soup.parse example) in
  print_s [%sexp (hrefs : string list)]

Note that we use print_s from expect_test_helpers_kernel to format the s-expression, and the %sexp syntax extension to generate the s-expression to print. Again, if we run jbuilder again, the output will be inserted into the file for us.

let%expect_test "soup" =
  let example = {|
     <html><body>
       <a href="http://janestreet.com"> A link! </a>
     </body></html> |}
  in
  let hrefs = get_hrefs (Soup.parse example) in
  print_s [%sexp (hrefs : string list)];
  [%expect {| (http://janestreet.com) |}]

At this point, it might occur to us to wonder what would happen if we had an <a> element with no href. Well, we can just try that out.

let%expect_test "soup" =
  let example = {|
     <html><body>
       <a href="http://janestreet.com"> A link! </a>
       <a> A broken link! </a>
     </body></html> |}
  in
  let hrefs = get_hrefs (Soup.parse example) in
  print_s [%sexp (hrefs : string list)];
  [%expect {| (http://janestreet.com) |}]

Rerunning the test demonstrates that our code throws an exception in this case.

let%expect_test "soup" =
  let example = {|
     <html><body>
       <a href="http://janestreet.com"> A link! </a>
       <a> A broken link! </a>
     </body></html> |}
  in
  let hrefs = get_hrefs (Soup.parse example) in
  print_s [%sexp (hrefs : string list)];
  [%expect {| DID NOT REACH THIS PROGRAM POINT |}];
  [%expect {|
    (* expect_test_collector: This test expectation appears to contain a backtrace.
       This is strongly discouraged as backtraces are fragile.
       Please change this test to not include a backtrace. *)

    ("A top-level expression in [let%expect] raised -- consider using [show_raise]"
     (Failure "Soup.R.attribute: None")
     (backtrace (
       "Raised at file \"pervasives.ml\", line 32, characters 17-33"
       "Called from file \"src/list.ml\", line 326, characters 13-17"
       "Called from file \"test.ml\", line 17, characters 14-44"
       "Called from file \"src/expect_test_helpers_kernel.ml\", line 475, characters 6-11"))) |}]

We can fix this easily enough by changing the selector we use to only look for <a> nodes with an href, as follows.

let get_hrefs soup =
  Soup.select "a[href]" soup
  |> Soup.to_list
  |> List.map ~f:(Soup.R.attribute "href")

And now, rerunning jbuilder will show that we get reasonable output once again.

let%expect_test "soup" =
  let example = {|
     <html><body>
       <a href="http://janestreet.com"> A link! </a>
       <a> A broken link! </a>
     </body></html> |}
  in
  let hrefs = get_hrefs (Soup.parse example) in
  print_s [%sexp (hrefs : string list)];
  [%expect {| (http://janestreet.com) |}]

Adding some real data

What if we want to apply this to some real data? Let’s grab the current contents of opensource.janestreet.com from the web and save it to a file called opensource.html. If we want our test to be able to read from this file, we need to add it as an explicit dependency, so we’ll adjust the jbuild file accordingly.

(jbuild_version 1)

(library
 ((name foo)
  (libraries (base stdio expect_test_helpers_kernel lambdasoup))
  (inline_tests ((deps (opensource.html))))
  (preprocess (pps (ppx_jane)))
))

Now, we can add a new test, to see what our function does on opensource.html.

let%expect_test "opensource" =
  let soup = In_channel.read_all "opensource.html" |> Soup.parse in
  let hrefs = get_hrefs soup in
  print_s [%sexp (hrefs : string list)]

Again, if we run the test, the file will be updated to include the output.

let%expect_test "opensource" =
  let soup = In_channel.read_all "opensource.html" |> Soup.parse in
  let hrefs = get_hrefs soup in
  print_s [%sexp (hrefs : string list)];
  [%expect {|
    (https://www.janestreet.com/ad-cookie-policy
     https://opensource.janestreet.com/
     https://github.com/janestreet
     https://ocaml.janestreet.com/ocaml-core/latest/doc/index.html
     https://github.com/janestreet
     https://github.com/ocaml/dune
     https://opensource.janestreet.com/base
     https://opensource.janestreet.com/core
     https://opensource.janestreet.com/async
     https://opensource.janestreet.com/incremental
     https://www.janestreet.com/technology/
     https://blog.janestreet.com/
     https://opensource.janestreet.com/contribute
     https://janestreet.com/) |}]

Now, we only wanted to extract the links that were actually on opensource.janestreet.com, and we got a bunch of other irrelevant links. To fix this, we need to analyze the URIs, so we’ll install the uri package from opam and add it to our jbuild, at which point we can change the code as follows.

let%expect_test "opensource" =
  let soup = In_channel.read_all "opensource.html" |> Soup.parse in
  let internal_links =
    get_hrefs soup
    |> List.filter ~f:(fun uri ->
        let uri = Uri.of_string uri in
        match Uri.host uri with
        | None -> false
        | Some host -> String.(=) host "opensource.janestreet.com")
  in
  print_s [%sexp (internal_links : string list)];
  [%expect {|
    (https://opensource.janestreet.com/
     https://opensource.janestreet.com/base
     https://opensource.janestreet.com/core
     https://opensource.janestreet.com/async
     https://opensource.janestreet.com/incremental
     https://opensource.janestreet.com/contribute) |}]

Which gives us what we were looking for.

What’s nice about this approach is that we’ve been able to do this all in a way that’s both lightweight and repeatable. We can take the code we’ve written, commit it to the repo we’re working on, and anyone else can try to extend our examples. What’s more, once the logic we want is finished, it might make sense to leave in these little experiments as regression tests, which will help make sure that we don’t break things as we start refactoring and reorganizing the code later.