Self Adjusting DOM and Diffable Data

In my last post, I gave some simple examples showing how you could use self adjusting computations, or SAC, as embodied by our Incremental library, to incrementalize the computation of virtual dom nodes. In this post, I’d like to discuss how we can extend this approach to more realistic scales, and some of the extensions to Incremental itself that are required to get there.

Along the way, we’ll discuss the way that diffable data structures relate to self adjusting computations, and how we can use ordinary pure data types and functional updates to those data types as the basis for building efficient incremental computations.

First, let’s bring back the code of the example from that post, which is a simple component that displayes the state of a machine in a datacenter.

module Machine = struct
  module Model = struct
    type display_mode = Temperature | Uptime
    type t = { temperature: float;
               uptime: Time.Span.t;
               selected: bool;
               mode: display_mode;
             }
      [@@deriving fields]
  end

  let view model =
    let mode        = Incr.map model ~f:Model.mode        in
    let temperature = Incr.map model ~f:Model.temperature in
    let selected    = Incr.map model ~f:Model.selected    in
    let uptime      = Incr.map model ~f:Model.uptime      in
    let text =
      match%bind mode with
      | Temperature ->
        let%map x = temperature in sprintf "%F degrees" x
      | Uptime ->
        let%map x = uptime in Time.Span.to_string x
    in
    let%map text = text and selected = selected in
    Vdom.text
      (if selected then [Vdom.class_ "selected"] else [])
      text
end

It’s worth mentioning that in the service of exercising the machinery, I’ve over-incrementalized this example. It would very likely be cheaper to just generate the necessary vdom in its entirety every time the model changes. That said, the above is still a good example of how to use Incremental, and it’s worth understanding how it works.

The first thing that is done by view is to project out different parts of the model into separate incrementals. The key observation is that, even though model changes, if Model.mode model doesn’t change, then the corresponding incremental will cut off computation there, preventing dependent computations from refiring. So, whenever the model changes we’ll check each field in the model and whether it changed before launching dependent computations.

By default, this cutoff happens based on physical equality; but one can also specify more expansive semantic notions of equality when that’s important.

Mapping over maps, incrementally

The projection approach works well for simple fixed-size records, but it doesn’t work when it comes to more complex models with variable sized components. To see why, let’s consider what happens if we add one more layer, an outer model consisting of a collection of named machines. We do this using OCaml’s Map data structure, which is essentially a functional dictionary.

module Model = struct
  type t = Machine.Model.t String.Map.t
end

How do we construct an incremental view for this? We can no longer simply project out all of the individual machine models into a fixed set of incrementals, since the number of machines in the map is not known in advance.

What we need here is a way of efficiently mapping over an incremental map. (The naming collision between the Incr.map function and the Map.t data structure is an unforutnate source of confusion here.) Here’s a signature that captures what we need:

val Incr.Map.map
  :  ('k,'v,'cmp) Map.t Incr.t
  -> f:('v Incr.t -> 'w Incr.t)
  -> ('k,'w,'cmp) Map.t Incr.t

If you’re not used to reading OCaml signatures, this can take a moment to digest. First, note that ('k,'v,'cmp) Map.t denotes a map whose key is of type 'k, whose data is of type 'v. (You can ignore the 'cmp parameter.)

So, what does this function do? A good rule of thumb for incremental is that if you want to understand the meaning of an incremental function (ignoring performance), you can just drop all the incremental bits. If we do that for the above type signature, we get this.

val Map.map
  :  ('k,'v,'cmp) Map.t
  -> f:('v -> 'w)
  -> ('k,'w,'cmp) Map.t

This type correctly suggests that the function is, as the name suggests, a map, which is to say, a function that transforms the elements of a container, thereby producing a new instance of that container.

That tells us what Incr.Map.map is supposed to compute, but what about the performance of that computation? In particular, what is the structure of the incremental graph this generates?

The following picture illustrates the desired incremental graph. The idea is that there is a path in the graph for every key/data pair in the map. When a given key is updated, an update should be passed down the corresponding incremental path. When a key is added or removed, then paths should be added or removed accordingly. And, rather than recomputing map' from scratch on every update, we want to do the corresponding update, insert or delete into map'.

This function is enough to build an efficient version of our view. In particular, we can write our view function as follows:

let view (model : Model.t Incr.t) : Vdom.t Incr.t =
  let%map machine_vdoms = Incr.Map.map model ~f:Machine.view in
  Vdom.node "div" [] (Map.data machine_vdoms)

Note that at the end, we do need to construct a complete list of all the displayed machines, and that does have to be reconstructed in its entirety on every change. That’s because the virtual dom API uses lists, which can’t be updated incrementally in an efficient way. If the virtual dom were modified to take a tree-like structure for nodes, this could be handled efficiently.

That said, I don’t think this is a serious practical problem, since in most UIs, the cost of allocating such nodes is a small part of the overall cost of reconstructing the virtual dom.

Diffable data and incrementality

In order to implement Incr.Map.map efficiently, we need some special support both from our Map datatype and from incremental itself. From Map, we need a way to efficiently figure out a set of updates, inserts and deletes that will take us from one map to another. Happily, the Map module in Core_kernel contains a function that can help.

val Map.symmetric_diff
  :  ('k, 'v, 'cmp) Map.t
  -> ('k, 'v, 'cmp) Map.t
  -> data_equal:('v -> 'v -> bool)
  -> ('k * [ `Left of 'v | `Right of 'v | `Unequal of 'v * 'v ]) Sequence.t

Here, the Left case corresponds to data that’s only in the first map, but not in the second, and so is a deletion. Similarly, the Right case corresponds to an addition, and the Unequal case corresponds to an update to the value.

A key property of Map.symmetric_diff is that it takes advantage of physical equality of nodes by short-circuiting. This means that if you take a map, make some modifications to it, and then compute the symmetric diff between the original map and the new one you just created, the computational effort is proportional to the work that was required to generate the second map in the first place. In other words, you can efficiently read back the set of changes applied to a map.

With Map.symmetric_diff, we have a way to efficiently determine what has changed in the map, and thereby which incrementals need to be updated, added or removed. This diffing is reminiscent of both the diffing inherent in the virtual dom approach, and the enter/update/exit pattern that characterizes the D3 visualization library.

Implementing Incr.Map.map

Map.symmetric_diff isn’t quite enough to build Incr.Map.map. We also need some extra support from Incremental itself for controlling the pattern of updates so that even though the node containing the input map has many dependent nodes, we only fire the nodes that are impliciated by the symmetric diff. Indeed, we’re still working on providing first class support for this kind of primitive within Incremental.

Even without those hooks, we can still take some steps in the right direction, the first of which is to use mutable state to keep the old map to diff against. Essentially, we want a function with this signature:

val Incr.diff_map :
  'a Incr.t -> f:(old:('a * 'b) option -> 'a -> 'b) -> 'b Incr.t

Here, the function f gets access to the old input and old output of this same function when computing the new one. We can implement diff_map straightforwardly enough:

let diff_map i ~f =
  let old = ref None in
  let%map a = i in
  let b = f ~old:!old a in
  old := Some (a,b);
  b

Using diff_map, we can implement a simpler form of Incr.Map.map which takes a function that transforms the elements of the map directly, rather than incrementally. It would have this signature:

val Incr.Map.simple_map
  :  ('a, 'b, 'c) Map.t Incr.t
  -> f:('b -> 'd)
  -> ('a, 'd, 'c) Map.t Incr.t

And here’s the implementation.

let simple_map m ~f =
  diff_map m ~f:(fun ~old m ->
      match old with
      | None -> Map.map m ~f
      | Some (old_in,old_out) ->
        let diff = Map.symmetric_diff ~data_equal:phys_equal old_in m in
        Sequence.fold diff ~init:old_out ~f:(fun acc (key,change) ->
            match change with
            | `Left _ -> Map.remove acc key
            | `Right d | `Unequal (_,d) -> Map.add acc ~key ~data:(f d)
          )
    )

The implementation is simple enough: when the map changes, we diff the old input and the new input. By folding over that diff, we can efficiently update the old output to produce the new output.

Implementing the full Incr.Map.map is more complicated, because the fact that f is a function that consumes and produces incrementals means that you need to construct a more complicated incremental graph – effectively, you need to split, transform and merge the results.

We can bridge the difference between Incr.Map.simple_map and Incr.Map.map with the following two function.

val Incr.Map.split :
  ('a,'b,'cmp) Map.t Incr.t -> ('a,'b Incr.t,'cmp) Map.t Incr.t
val Incr.Map.join :
  ('a,'b Incr.t,'cmp) Map.t Incr.t -> ('a,'b,'cmp) Map.t Incr.t

Incr.Map.split takes an incremental map and produces an incremental map whose data is itself incremental. The idea is that the outer incremental updates only when a key is added or removed from the input map. Changes to the input map that change data of an existing key instead lead to an update to the corresponding inner incremental. Incr.Map.join is simply the inverse operation, taking an incremental map full of incrementals, and producing a simple incremental map.

Using these two functions together, we can convert our simple_map into a full implementation of Incr.Map.map, as follows.

let map m ~f =
  Incr.Map.join (simple_map ~f (Incr.Map.split m))

While there are hacks to approximate them using Incr.map and Incr.bind, the ordinary incremental interface isn’t enough to build a version of Incr.Map.join and Incr.Map.split with the right performance characteristics. For that reason, we’ve started work on an Expert interface within Incremental that lets you create incremental nodes with precise control over their dependencies and when they refire.

Why virtual-dom?

Almost everyone that I’ve discussed these ideas with, both inside of Jane Street and beyond, have asked the question: given that you have Incremental, why bother with using a virtual dom at all? After all, Incremental provides a mechanism for tracking which parts of the computation have changed. WHy should one have two layers that are doing something so similar?

We’ve made things even more duplicative by diffing not just the virtual dom but also our own internal data structures as a way of creating more efficient incremental computations on top of ordinary functional data types.

But despite the seeming redundancy of this approach, I think it has real value. That’s because, although the optimizations provided by Incremental are valuable, programming with Incremental is a worse experience than doing ordinary functional programming with immutable values. The big win of virtual dom is that it lets you do most of your programming in a typical functional style with reasonable performance. You only need to resort to using Incremental when the performance provided by this simple approach isn’t sufficient.

But diffability isn’t just useful on the output of your incremental computation. It’s also helpful on the inputs, and that’s where the diffability of Core_kernel’s maps comes in handy. That way, we can write our code for updating our model in an ordinary functional style, but still use incremental to efficiently scaffold our view functions on top.

Summing up

We have some beginning versions of all of this infrastructure internally at this point. In particular, we’ve built fast versions of Incr.Map.map and friends, and used this for creating some simple dynamic web applications. The results so far are pretty promising. By paying some careful attention to how the input data changes and incrementalizing accordingly, we can build applications that display thousands of rows of data of which hundreds of cells are changing every second with relatively low latency, able to respond to every update in just a handful of milliseconds.

I doubt our performance is as good as more widely used and likely better optimized frameworks like React and Mercury. But the thing that has struck me about all of this work is how easy and natural it has been. With only a small amount of effort, we’ve been able to leverage existing tools, from Incremental to Async to virtual-dom, to write highly responsive applications in a simple and natural style, and one that fits naturally within our existing OCaml libraries and tools.

Another interesting question is how this all relates to the use of FRP for UIs. I’ve written a bit about the difference between FRP and SAC, but the short version is that FRP is more about modeling temporal processes, and SAC is purely about optimization.

For my perspective, SAC fits more cleanly into the goal of simply rendering virtual-dom, and I can’t see as much value in having time-oriented operations as a first-class construct. And the fact that SAC is focused purely on optimization allows it to do a better job of that. In particular, the dynamism of operations like bind makes it possible to precisely control the dependency structure of the system, and do things like quiesce parts of the UI that are not in view. This is harder to do in an FRP system without introducing memory leaks.

All of which makes me think that self adjusting computation is a useful idiom for UI programming, and perhaps one that should be adopted more widely.