GADTs allow one to statically enforce stronger program invariants than are otherwise possible in a Hindley-Milner style type system. This post retells the story of how to “roll your own” GADTs using an explicit type of equality constraints. More interestingly, we discuss a particularly versatile definition of type equality in Haskell that can now be transcribed into OCaml due to the recent addition of first class modules.
GADTs
The acronym GADT stands for Generalized Algebraic DataType. One definition of a GADT is a parameterized algebraic datatype in which type parameters may vary from one constructor to the next. Consider the following example (adapted from the Haskell Wiki’s GADT page, which is also a good starting point for papers on this sort of thing):
data Term x where
Lit :: Int -> Term Int
Pair :: Term a -> Term b -> Term (a, b)
Fst :: Term (a, b) -> Term a
Snd :: Term (a, b) -> Term b
This type is like a variant type in OCaml except that (1) the constructors’ type
signatures are given explicitly as opposed to the standard compact BNF
grammar-like syntax for algebraic datatypes, and (2) the constructors do not
uniformly return Term x
. Instead, the type parameter x
in the return type
varies from constructor to constructor.
In this particular example, GADTs are used to ensure that well-typed values of
type Term a
always represent values of type a
. In other words, by embedding
the type system of the defined language (Term
in this example) into the type
system of the defining language (Haskell in this example), we are guaranteed
that type errors in the defined language are caught by the type checker of the
defining language. The following idealized interpreter session shows this
behavior.
ghci> :t (\x -> Pair (Lit 5) (Fst x))
forall a b. Term (a, b) -> Term (Int, a)
ghci> Fst (Lit 5)
Error: ... expected type "(a, b)" but inferred type "Int" ...
Pattern matching rules for GADTs are also stronger than those for vanilla datatypes in that the type checker may discover additional constraints about a type that hold in the context of a particular pattern matching branch. Consider the following evaluator:
eval :: Term a -> a
eval (Lit i) = i
eval (Pair x y) = (eval x, eval y)
eval (Fst xy) = x where (x, y) = eval xy
eval (Snd xy) = y where (x, y) = eval xy
In the second clause of the definition of eval
, the type checker learns from
the type of the constructor Pair
that a = (b, c)
for some b
and c
. This
information is used to justify returning a value of type (b, c)
where a value
of type a
was expected.
The statically typed evaluator is a well-worn example of GADTs. A perhaps less-worn example is type-preserving rewrite rules.
simplify :: Term a -> Term a
simplify (Fst (Pair x _)) = simplify x
simplify (Snd (Pair _ y)) = simplify y
simplify other = other
ghci> eval (Fst (Pair (Lit 5) (Lit 8)))
5
ghci> simplify (Fst (Pair (Lit 5) (Lit 8)))
(Lit 5)
Regarding other applications of GADTs, a future post will recount how to write programs exhibiting “intensional polymorphism” using a GADT of type representations a la Stephanie Weirich and then show a nice way to represent a significant subset of OCaml types this way.
Popular language extensions allow one to program directly with GADTs in Haskell as shown here. However, plain Haskell allows us to “roll our own” GADTs with a little extra work.
Encoding GADTs with type equality
The above description of pattern matching with GADTs hints at a useful fact: the extra power afforded by GADTs has to do with tracking and applying additional type equality constraints. What proper GADTs do for us automatically, we may do for ourselves if we have a way to manipulate type equalities.
To this end, assume we have a type Equal a b
that stands for the proposition
that a
and b
are equal types. Furthermore, assume that this type supports
the following operations.
refl :: Equal a a
symm :: Equal a b -> Equal b a
trans :: Equal a b -> Equal b c -> Equal a c
lift :: Equal a b -> Equal (f a) (f b)
coerce :: Equal a b -> a -> b
Viewed through the propositions-as-types lens, the types of refl
, symm
, and
trans
say that Equal
is an equivalence relation, while the type of lift
says that every type constructor f
preserves Equal
.
Finally the coerce
function says what we can do with values of type
Equal a b
, namely coerce values from type a
into an equivalent type b
.
Since a
and b
are equal, we expect such a coercion to behave like the
identity function. In particular, coerce
should in no way inspect its second
argument.
The above definition of Term
may then be encoded in terms of Equal
as
follows:
data Term x
= Lit (Equal x Int) Int
| forall a b. Pair (Equal x (a, b)) (Term a) (Term b)
| forall b. Fst (Term (x, b))
| forall a. Snd (Term (a, x))
The last three constructors here make use of existentially quantified type
variables (somewhat confusingly introduced with the keyword forall
). This
definition yields constructors with the following types
Lit :: Equal x Int -> Int -> Term x
Pair :: Equal x (a, b) -> Term a -> Term b -> Term x
Fst :: Term (a, b) -> Term a
Snd :: Term (a, b) -> Term b
Partially applying Lit and Pair constructors to refl
yields more constructor
functions with more natural types.
num :: Int -> Term Int
pair :: Term a -> Term b -> Term (a, b)
num = Lit refl
pair = Pair refl
Note that these types are the same as those declared for constructors of the
GADT version of Term
.
When pattern matching, we will find it useful to explicitly manipulate equality
constraints in the form of values of type Equal a b
for particular types a
and b
. The eval
example becomes
eval :: Term a -> a
eval (Lit eq i) = coerce (symm eq) i
eval (Pair eq a b) = coerce (symm eq) (eval a, eval b)
eval (Fst a) = x where (x, y) = eval a
eval (Snd a) = y where (x, y) = eval a
The use of equality values here makes precise the informal reasoning given above
for why a GADT-aware type checker accepted the previous version of eval
. The
burden on the programmer is non-trivial, but we may be willing to pay the price
in order to squeeze more assurances from the type system.
User-defined type equality
How might one implement Equal
? Amazingly, there is a GADT-free definition for
Equal
involving polymorphism over type constructors rather than over types,
namely
data Equal a b = Coerce (forall f. f a -> f b)
As a logical formula, this definition says that a
equals b
if every property
f
that holds of a
also holds of b
. The type variable f
in this
definition ranges over type constructors (like OCaml’s option
and list
)
rather than types (like OCaml’s int
and string
).
Reflexivity and transitivity are easy to prove with this definition.
refl :: Equal a a
refl = Coerce id
trans :: Equal a b -> Equal b c -> Equal a c
trans (Coerce f) (Coerce g) = Coerce (g . f)
(Note that the infix dot is Haskell’s function composition operator). Symmetry
is more difficult, so we leave it for later. Coercion may be defined by
instantiating f
to the identity type constructor.
newtype Id a = Id { unId :: a }
coerce :: Equal a b -> a -> b
coerce (Coerce f) = unId . f . Id
The idiom used in the definition of Id
gives us both an injection function
Id :: a -> Id a
as well as a projection function unId :: Id a -> a
. The
keyword newtype
means that this type definition is treated as a type synonym
at runtime, so that Id
and unId
are both implemented as the identity
function and therefore serve only to guide the typechecker. The definition of
lift
is a variation on this theme.
newtype Compose f1 f2 a = Compose { unCompose :: f1 (f2 a) }
lift :: Equal a b -> Equal (f a) (f b)
lift (Coerce f) = Coerce (unCompose . f . Compose)
Though the definition of Equal
seems very asymmetric, one may define symm
by
instantiating the argument coercion to the property f c = Equal c a
, which
must hold of b
since it holds of a
(via refl
).
newtype FlipEqual a c = Flip { unFlip : Equal c a }
symm :: Equal a b -> Equal b a
symm (Coerce f) = (unFlip . f . Flip) refl
Transcription into OCaml
How might we transcribe this implementation of Equal
into OCaml? The most
difficult aspect seems to be the universal quantification over a type
constructor rather than over a regular type. Parameterization over a type
constructor is something that can be done in OCaml only via a functor.
Fortunately, OCaml 3.12’s new first class modules allow us to embed modules and
functors into the value world.
module type Equal = sig
type fst
type snd
module Coerce :
functor (F : sig type 'a t end) -> sig
val f : fst F.t -> snd F.t
end
end
type ('a, 'b) equal = (module Equal with type fst = 'a and type snd = 'b)
After making this initial step, I found transcribing the remaining Haskell definitions into OCaml is a useful exercise in learning to program with first class modules. Following Stephen’s lead in an earlier post, I will give the reader a chance to work out the remaining definitions on his/her own before posting my solution.
Comparison with previous implementations
Defining type equality in OCaml is not a new idea. Oleg Kiselyov has a clever implementation that only requires explaining away a single unreachable exception-raising expression. Similarly, a simple pair-of-coercions implementation of type equality features in one of the examples of first class modules in the OCaml 3.12 reference manual. So what value is added by importing the implementation used in pure Haskell?
The answer lies in the lift
combinator, which is not supported by previous
OCaml implementations of type equality. Without it, consider what one would have
to do to coerce an 'a list
into a 'b list
given a value of type
('a, 'b) Equal.t
. It seems clear that one must essentially map an 'a
to 'b
coercion across the list, therefore copying the spine. The implementation
presented here, however, allows one to do the coercion without ever inspecting
the list.
For lists, this may not be so bad. But in general, the occurrence of the type
parameter 'a
may be arbitrarily deep in the definition of a type constructor,
and the deeper the occurrence, the more costly the traversal.