GADTs allow one to statically enforce stronger program invariants than are otherwise possible in a Hindley-Milner style type system. This post retells the story of how to “roll your own” GADTs using an explicit type of equality constraints. More interestingly, we discuss a particularly versatile definition of type equality in Haskell that can now be transcribed into OCaml due to the recent addition of first class modules.

## GADTs

The acronym GADT stands for Generalized Algebraic DataType. One definition of a GADT is a parameterized algebraic datatype in which type parameters may vary from one constructor to the next. Consider the following example (adapted from the Haskell Wiki’s GADT page, which is also a good starting point for papers on this sort of thing):

```
data Term x where
Lit :: Int -> Term Int
Pair :: Term a -> Term b -> Term (a, b)
Fst :: Term (a, b) -> Term a
Snd :: Term (a, b) -> Term b
```

This type is like a variant type in OCaml except that (1) the constructors’ type
signatures are given explicitly as opposed to the standard compact BNF
grammar-like syntax for algebraic datatypes, and (2) the constructors do not
uniformly return `Term x`

. Instead, the type parameter `x`

in the return type
varies from constructor to constructor.

In this particular example, GADTs are used to ensure that well-typed values of
type `Term a`

always represent values of type `a`

. In other words, by embedding
the type system of the defined language (`Term`

in this example) into the type
system of the defining language (Haskell in this example), we are guaranteed
that type errors in the defined language are caught by the type checker of the
defining language. The following idealized interpreter session shows this
behavior.

```
ghci> :t (\x -> Pair (Lit 5) (Fst x))
forall a b. Term (a, b) -> Term (Int, a)
ghci> Fst (Lit 5)
Error: ... expected type "(a, b)" but inferred type "Int" ...
```

Pattern matching rules for GADTs are also stronger than those for vanilla datatypes in that the type checker may discover additional constraints about a type that hold in the context of a particular pattern matching branch. Consider the following evaluator:

```
eval :: Term a -> a
eval (Lit i) = i
eval (Pair x y) = (eval x, eval y)
eval (Fst xy) = x where (x, y) = eval xy
eval (Snd xy) = y where (x, y) = eval xy
```

In the second clause of the definition of `eval`

, the type checker learns from
the type of the constructor `Pair`

that `a = (b, c)`

for some `b`

and `c`

. This
information is used to justify returning a value of type `(b, c)`

where a value
of type `a`

was expected.

The statically typed evaluator is a well-worn example of GADTs. A perhaps less-worn example is type-preserving rewrite rules.

```
simplify :: Term a -> Term a
simplify (Fst (Pair x _)) = simplify x
simplify (Snd (Pair _ y)) = simplify y
simplify other = other
```

```
ghci> eval (Fst (Pair (Lit 5) (Lit 8)))
5
ghci> simplify (Fst (Pair (Lit 5) (Lit 8)))
(Lit 5)
```

Regarding other applications of GADTs, a future post will recount how to write programs exhibiting “intensional polymorphism” using a GADT of type representations a la Stephanie Weirich and then show a nice way to represent a significant subset of OCaml types this way.

Popular language extensions allow one to program directly with GADTs in Haskell as shown here. However, plain Haskell allows us to “roll our own” GADTs with a little extra work.

## Encoding GADTs with type equality

The above description of pattern matching with GADTs hints at a useful fact: the extra power afforded by GADTs has to do with tracking and applying additional type equality constraints. What proper GADTs do for us automatically, we may do for ourselves if we have a way to manipulate type equalities.

To this end, assume we have a type `Equal a b`

that stands for the proposition
that `a`

and `b`

are equal types. Furthermore, assume that this type supports
the following operations.

```
refl :: Equal a a
symm :: Equal a b -> Equal b a
trans :: Equal a b -> Equal b c -> Equal a c
lift :: Equal a b -> Equal (f a) (f b)
coerce :: Equal a b -> a -> b
```

Viewed through the propositions-as-types lens, the types of `refl`

, `symm`

, and
`trans`

say that `Equal`

is an equivalence relation, while the type of `lift`

says that every type constructor `f`

preserves `Equal`

.

Finally the `coerce`

function says what we can *do* with values of type
`Equal a b`

, namely coerce values from type `a`

into an equivalent type `b`

.
Since `a`

and `b`

are equal, we expect such a coercion to behave like the
identity function. In particular, `coerce`

should in no way inspect its second
argument.

The above definition of `Term`

may then be encoded in terms of `Equal`

as
follows:

```
data Term x
= Lit (Equal x Int) Int
| forall a b. Pair (Equal x (a, b)) (Term a) (Term b)
| forall b. Fst (Term (x, b))
| forall a. Snd (Term (a, x))
```

The last three constructors here make use of existentially quantified type
variables (somewhat confusingly introduced with the keyword `forall`

). This
definition yields constructors with the following types

```
Lit :: Equal x Int -> Int -> Term x
Pair :: Equal x (a, b) -> Term a -> Term b -> Term x
Fst :: Term (a, b) -> Term a
Snd :: Term (a, b) -> Term b
```

Partially applying Lit and Pair constructors to `refl`

yields more constructor
functions with more natural types.

```
num :: Int -> Term Int
pair :: Term a -> Term b -> Term (a, b)
num = Lit refl
pair = Pair refl
```

Note that these types are the same as those declared for constructors of the
GADT version of `Term`

.

When pattern matching, we will find it useful to explicitly manipulate equality
constraints in the form of values of type `Equal a b`

for particular types `a`

and `b`

. The `eval`

example becomes

```
eval :: Term a -> a
eval (Lit eq i) = coerce (symm eq) i
eval (Pair eq a b) = coerce (symm eq) (eval a, eval b)
eval (Fst a) = x where (x, y) = eval a
eval (Snd a) = y where (x, y) = eval a
```

The use of equality values here makes precise the informal reasoning given above
for why a GADT-aware type checker accepted the previous version of `eval`

. The
burden on the programmer is non-trivial, but we may be willing to pay the price
in order to squeeze more assurances from the type system.

## User-defined type equality

How might one implement `Equal`

? Amazingly, there is a GADT-free definition for
`Equal`

involving polymorphism over type constructors rather than over types,
namely

```
data Equal a b = Coerce (forall f. f a -> f b)
```

As a logical formula, this definition says that `a`

equals `b`

if every property
`f`

that holds of `a`

also holds of `b`

. The type variable `f`

in this
definition ranges over type constructors (like OCaml’s `option`

and `list`

)
rather than types (like OCaml’s `int`

and `string`

).

Reflexivity and transitivity are easy to prove with this definition.

```
refl :: Equal a a
refl = Coerce id
trans :: Equal a b -> Equal b c -> Equal a c
trans (Coerce f) (Coerce g) = Coerce (g . f)
```

(Note that the infix dot is Haskell’s function composition operator). Symmetry
is more difficult, so we leave it for later. Coercion may be defined by
instantiating `f`

to the identity type constructor.

```
newtype Id a = Id { unId :: a }
coerce :: Equal a b -> a -> b
coerce (Coerce f) = unId . f . Id
```

The idiom used in the definition of `Id`

gives us both an injection function
`Id :: a -> Id a`

as well as a projection function `unId :: Id a -> a`

. The
keyword `newtype`

means that this type definition is treated as a type synonym
at runtime, so that `Id`

and `unId`

are both implemented as the identity
function and therefore serve only to guide the typechecker. The definition of
`lift`

is a variation on this theme.

```
newtype Compose f1 f2 a = Compose { unCompose :: f1 (f2 a) }
lift :: Equal a b -> Equal (f a) (f b)
lift (Coerce f) = Coerce (unCompose . f . Compose)
```

Though the definition of `Equal`

seems very asymmetric, one may define `symm`

by
instantiating the argument coercion to the property `f c = Equal c a`

, which
must hold of `b`

since it holds of `a`

(via `refl`

).

```
newtype FlipEqual a c = Flip { unFlip : Equal c a }
symm :: Equal a b -> Equal b a
symm (Coerce f) = (unFlip . f . Flip) refl
```

## Transcription into OCaml

How might we transcribe this implementation of `Equal`

into OCaml? The most
difficult aspect seems to be the universal quantification over a type
constructor rather than over a regular type. Parameterization over a type
constructor is something that can be done in OCaml only via a functor.
Fortunately, OCaml 3.12’s new first class modules allow us to embed modules and
functors into the value world.

```
module type Equal = sig
type fst
type snd
module Coerce :
functor (F : sig type 'a t end) -> sig
val f : fst F.t -> snd F.t
end
end
type ('a, 'b) equal = (module Equal with type fst = 'a and type snd = 'b)
```

After making this initial step, I found transcribing the remaining Haskell definitions into OCaml is a useful exercise in learning to program with first class modules. Following Stephen’s lead in an earlier post, I will give the reader a chance to work out the remaining definitions on his/her own before posting my solution.

## Comparison with previous implementations

Defining type equality in OCaml is not a new idea. Oleg Kiselyov has a clever implementation that only requires explaining away a single unreachable exception-raising expression. Similarly, a simple pair-of-coercions implementation of type equality features in one of the examples of first class modules in the OCaml 3.12 reference manual. So what value is added by importing the implementation used in pure Haskell?

The answer lies in the `lift`

combinator, which is not supported by previous
OCaml implementations of type equality. Without it, consider what one would have
to do to coerce an `'a list`

into a `'b list`

given a value of type
`('a, 'b) Equal.t`

. It seems clear that one must essentially map an `'a`

to `'b`

coercion across the list, therefore copying the spine. The implementation
presented here, however, allows one to do the coercion without ever inspecting
the list.

For lists, this may not be so bad. But in general, the occurrence of the type
parameter `'a`

may be arbitrarily deep in the definition of a type constructor,
and the deeper the occurrence, the more costly the traversal.