Implementing the Mini-Language Arith in Elm
In my quest to better understand type theory, I’ve started reading Benjamin Pierce’s excellent book, Types and Programing Languages. Early on he introduces a little language — arith
. While it is an untyped language, it is a superb vehicle for illustrating many fundamental notions and tools, e.g., structural induction, abstract syntax trees, etc. What I would like to describe here is a parser and interpreter for this language written in Elm. Writing code for these is a little exercise that helps to understand better some of the ideas in the book. Pierce gives a link to parser and interpreter code in OCaml. But of course one learns more if one does it oneself. The code for the Elm implementation is on GitHub.
Note (May 6, 2020): I’ve added an article on type-checking Arith
.
The Grammar of Arith
The language has a very simple grammar sufficient to express some of the structure of the natural numbers. Simple foundations for the natural numbers were formulated by Peano (1879). A key feature of his formulation was the successor function succ: given a natural number n, the next natural number is succ n.
To describe the grammar of arith
, we give rules for forming its terms: (1) the symbols true
, false
, and 0 are terms; (2) if t
is a term, then so are succ t
, pred t
, and iszero
t; (3) If s
t
and u
are terms, then so is if t then s else u
. Here is an example of a “sentence” in arith
, as a human would read it or write it:
S = if iszero succ 0 then succ succ succ 0 else false
The real structure of the sentence S is given by the tree T pictured in the figure on the left. As we shall see shortly, one can construct a function parse
that maps sentences s
to one of two possible values. The first, in the case where the sentence "makes sense,” isOk t
, where t
is a representation of the tree corresponding to s
. The second value, in the case where parse
does not understand s
, is Err str
, where str
is a string representing an error message. There is also a function stringValue
, which maps trees to strings. These two functions are, roughly speaking inverses of each other. Therefore the string and tree representations of a term of arith
are equivalent. Note, by the way, that there is no need for parentheses in the s-representation. The grammar is sufficient to give it an unambiguous meaning.
The tree created by parse
is called an abstract syntax tree, or AST, for short. Jumping ahead a bit, let’s test the functions parse
and stringValue
using the Elm repl. We start with a string input
, apply parse
to it to produce an AST, then apply stringValue
to the AST to recover the string input
:
> input = "if iszero succ 0 then succ succ succ 0 else false"> parse input
Ok (IfExpr (IsZero (Succ Zero)) (Succ (Succ (Succ Zero))) F)
: Result (List Parser.DeadEnd) Term> ast = IfExpr (IsZero (Succ Zero)) (Succ (Succ (Succ Zero))) F> stringValue ast
"if iszero succ 0 then succ succ succ 0 else False" : String
If the input is nonsense, then the parser returns an error value:
> parse "if 0 then pred else succ"
Err [...]
: Result (List Parser.DeadEnd) Term
The AST
We said that ast
is the abstract syntax tree corresponding to the input sentence. We picture a representation of this tree on the left; it is a two-dimensional structure. But what we see in the terminal, is a “linear,” one-dimensional structure, with one word after another, albeit including parentheses. Both are representations of an Elm value. Every Elm value has a type, so we must formulate a definition of the type of AST’s, or what is the same thing, of terms. This is what we do next.
The Type of a Term
Define a custom type (aka algebraic data type):
type Term
= T
| F
| Zero
| Succ Term
| Pred Term
| IsZero Term
| IfExpr Term Term Term
Notice that definition of Term
is recursive, since it refers to itself. Note also the very close, essentially one-to-one correspondence between the definition of Term
and the rules of the grammar.
A Parser
Our next task is to construct a parser—a function parse
that will, if successful, map strings to values of type Term
:
parse : String -> Result (List Parser.DeadEnd) Term
parse str = Parser.run term str
Note the type for the function value, Result error value
. It enables us to write functions which may succeed or fail, which in the case at hand means parse successfully or fail to parse. The function Parser.run
(from the elm/parser library) “runs” a Parser
on its input. Thus the real task is to understand the parser term
, defined below. (We will explain things here modulo more detail to be found in th elm/parser library and in the code on GitHub).
term : Parser Term
term =
succeed identity
|. spaces
|= oneOf
[ true
, false
, zero
, succ
, pred
, iszero
, lazy (\_ -> ifExpr)
]
The parser term
first “eats” any spaces it encounters, then uses the oneOf
parser to try a list of parsers, each one in turn. If it succeeds with a component parser, it stops and returns the value parsed. If it succeeds with no component parser it returns an error value. Let us examine the component parsers. Those like the true
parser are easy to understand:
true : Parser Term
true =
succeed T
|. symbol "true"
If the string “true” can be eaten by the parser, the value T : Term
is returned. Parsers like succ
are slightly more complicated:
succ : Parser Term
succ =
symbol "succ"
|> andThen (\_ -> term)
|> map (\t -> Succ t)
The idea is that this parser first tries to eat the string “succ”. If successful, the parser then runs the parserterm
to recognize the termt
in suc t
. If that operation is successful, the term recognized is wrapped with the Term
Succ
by applying the function \t -> Succ t
. To really understand what is going on here, one needs to work out all of the types, making sure that they fit together. For example, consulting the elm/parser library, we find that
andThen : (a -> Parser b) -> Parser a -> Parser b
Thus andThen
gives (roughly speaking), a way to follow one parser with another, hence a way to sequence parsers where the second parser can receive information from the first. In more detail, using andThen
, one can combine a Parser a
with a function a -> Parser b
to produce value of type Parser b
. In the case above, the first parser (Parser a
), is symbol "succ"
and the second is term
. We use the function \t -> term
because no argument to term
is needed.
Finally, there is the parser for inputs like if true then false else succ 0
. It uses a parser pipeline which operates as follows: (a) recognize the symbol “if”, then a term; (b) eat spaces, then the symbol “then”, then a term; c) eat spaces, the symbol “else”, then a term. The effect of the parser (|.)
is to run the parser on its right, but to ignore the value parsed. The effect of the parser (|=)
is similar, except that the value parsed is kept. In the end, all the “kept” values are fed as arguments to the function IfExpr
to construct a value of type Term
.
ifExpr : Parser Term
ifExpr =
succeed IfExpr
|. symbol "if"
|= term
|. spaces
|. symbol "then"
|= term
|. spaces
|. symbol "else"
|= term
The use of lazy
as in . lazy (_ -> ifExpr)
is explained by the following remark (cf Czapliki):
In Elm, you can only define a value in terms of itself it is behind a function call. So
lazy
helps us define these self-referential parsers. (andThen
can be used for this as well!)
An Evaluator
In this section, we discuss two functions, eval : Term -> Value
and evalString : String -> Value
. The second, which is the interpreter for arith
is (essentially) the composition eval o parse
, so the real work is in defining eval
.
To define eval
, we must first decide what is the type of the values it computes:
type Value
= Numeric Int
| Boolean Bool
| Error String
The definition accounts for the fact that there could be either numeric values, like succ succ 0
, otherwise known as 2
, or the Boolean values True
or False
. It also accounts for the fact that evaluation may fail. For example, the string
input = "if succ 0 then 0 else succ 0"
can be parsed into a valid AST, namely
ast = IfExpr (Succ Zero) Zero (Succ Zero)
However, evaluation fails, because the first argument of an if-expression must be a Boolean:
> eval ast
Error ("If-then-else: expecting boolean value") : Value
Below is the definition of eval
. It depends on (and expresses) some reasonable (but here unwritten) semantics for the language arith
. The function eval
is a big case
statement, with one branch for each part of the definition of Term
.
eval : Term -> Value
eval t =
case t of
Zero ->
Numeric 0 Succ a ->
case eval a of
Numeric b ->
Numeric (b + 1) _ ->
Error "succ expects numeric value" Pred a ->
case eval a of
Numeric b ->
if b == 0 then
Numeric 0
else
Numeric (b - 1) _ ->
Error "pred expects numeric value" F ->
Boolean False T ->
Boolean True IsZero a ->
case eval a of
Numeric b ->
Boolean (b == 0) _ ->
Error "iszero expects a numeric value" IfExpr a b c ->
case eval a of
Boolean v ->
case v of
True ->
eval b False ->
eval c _ ->
Error "If-then-else: expecting boolean value"
Comments
- The reason that the
eval
function can produce values of typeError String
is that the languagearith
is not typed. Thus sentences such as “if succ iszero succ 0 then 0 else succ 0” can be compiled into a valid AST but cannot be evaluated without error. - Julian Antonielli has some very nice slides describing a type checker for
arith
. His implementation is in Haskell. I plan to give it a try in Elm. Among the references at the end of Julian’s slides is a link to a YouTube video on “typechecking from scratch.” - A good exercise is to rig up a function to compute the depth of the AST produced by
parse
. Another good exercise is to work out a definition ofstringValue
. You will find code for both of these in the GitHub repo for the Elm implementation ofarith
.
Definition of depth
- The depth of a term like
true
,false
,0
is 1. - The depth of the term
succ t
is 1 + deptht
, and similarly forpred
andiszero
. - The depth of
if a then b else c
is 1 + max(depth a, depth b, depth c).