# Implementing the Mini-Language Arith in Elm

In my quest to better understand type theory, I’ve started reading Benjamin Pierce’s excellent book, *Types and Programing Languages*. Early on he introduces a little language — `arith`

. While it is an untyped language, it is a superb vehicle for illustrating many fundamental notions and tools, e.g., structural induction, abstract syntax trees, etc. What I would like to describe here is a parser and interpreter for this language written in Elm. Writing code for these is a little exercise that helps to understand better some of the ideas in the book. Pierce gives a link to parser and interpreter code in OCaml. But of course one learns more if one does it oneself. The code for the Elm implementation is on GitHub.

**Note **(May 6, 2020): I’ve added an article on type-checking `Arith`

.

## The Grammar of Arith

The language has a very simple grammar sufficient to express some of the structure of the natural numbers. Simple foundations for the natural numbers were formulated by Peano (1879). A key feature of his formulation was the successor function *succ*: given a natural number *n, *the next natural number is *succ n*.

To describe the grammar of `arith`

, we give rules for forming its *terms*: (1) the symbols `true`

, `false`

, and 0 are terms; (2) if `t`

is a term, then so are `succ t`

, `pred t`

, and `iszero `

t; (3) If `s`

`t`

and `u`

are terms, then so is `if t then s else u`

. Here is an example of a “sentence” in `arith`

, as a human would read it or write it:

` S = if iszero succ 0 then succ succ succ 0 else false`

The real structure of the sentence S is given by the tree T pictured in the figure on the left. As we shall see shortly, one can construct a function `parse`

that maps sentences `s`

to one of two possible values. The first, in the case where the sentence "makes sense,” is`Ok t`

, where `t`

is a representation of the tree corresponding to `s`

. The second value, in the case where `parse`

does not understand `s`

, is `Err str`

, where `str`

is a string representing an error message. There is also a function `stringValue`

, which maps trees to strings. These two functions are, roughly speaking inverses of each other. Therefore the string and tree representations of a term of `arith`

are equivalent. Note, by the way, that there is no need for parentheses in the s-representation. The grammar is sufficient to give it an unambiguous meaning.

The tree created by `parse`

is called an abstract syntax tree, or AST, for short. Jumping ahead a bit, let’s test the functions `parse`

and `stringValue`

using the Elm repl. We start with a string `input`

, apply `parse`

to it to produce an AST, then apply `stringValue`

to the AST to recover the string `input`

:

> input = "if iszero succ 0 then succ succ succ 0 else false"> parse input

Ok (IfExpr (IsZero (Succ Zero)) (Succ (Succ (Succ Zero))) F)

: Result (List Parser.DeadEnd) Term> ast = IfExpr (IsZero (Succ Zero)) (Succ (Succ (Succ Zero))) F> stringValue ast

"if iszero succ 0 then succ succ succ 0 else False" : String

If the input is nonsense, then the parser returns an error value:

`> parse "if 0 then pred else succ"`

Err [...]

: Result (List Parser.DeadEnd) Term

## The AST

We said that `ast`

is the abstract syntax tree corresponding to the input sentence. We picture a representation of this tree on the left; it is a two-dimensional structure. But what we see in the terminal, is a “linear,” one-dimensional structure, with one word after another, albeit including parentheses. Both are representations of an Elm value. Every Elm value has a type, so we must formulate a definition of the type of AST’s, or what is the same thing, of terms. This is what we do next.

## The Type of a Term

Define a custom type (aka algebraic data type):

`type Term`

= T

| F

| Zero

| Succ Term

| Pred Term

| IsZero Term

| IfExpr Term Term Term

Notice that definition of `Term`

is recursive, since it refers to itself. Note also the very close, essentially one-to-one correspondence between the definition of `Term`

and the rules of the grammar.

## A Parser

Our next task is to construct a parser—a function `parse`

that will, if successful, map strings to values of type `Term`

:

`parse : String -> Result (List Parser.DeadEnd) Term`

parse str = Parser.run term str

Note the type for the function value, `Result error value`

. It enables us to write functions which may succeed or fail, which in the case at hand means parse successfully or fail to parse. The function `Parser.run`

(from the elm/parser library) “runs” a `Parser`

on its input. Thus the real task is to understand the parser `term`

, defined below. (We will explain things here modulo more detail to be found in th elm/parser library and in the code on GitHub).

`term : Parser Term`

term =

succeed identity

|. spaces

|= oneOf

[ true

, false

, zero

, succ

, pred

, iszero

, lazy (\_ -> ifExpr)

]

The parser `term`

first “eats” any spaces it encounters, then uses the `oneOf`

parser to try a list of parsers, each one in turn. If it succeeds with a component parser, it stops and returns the value parsed. If it succeeds with no component parser it returns an error value. Let us examine the component parsers. Those like the `true`

parser are easy to understand:

`true : Parser Term`

true =

succeed T

|. symbol "true"

If the string “true” can be eaten by the parser, the value `T : Term`

is returned. Parsers like `succ`

are slightly more complicated:

`succ : Parser Term`

succ =

symbol "succ"

|> andThen (\_ -> term)

|> map (\t -> Succ t)

The idea is that this parser first tries to eat the string “succ”. If successful, the parser then runs the parser`term`

to recognize the term`t`

in `suc t`

. If that operation is successful, the term recognized is wrapped with the `Term`

`Succ`

by applying the function `\t -> Succ t`

. To really understand what is going on here, one needs to work out all of the types, making sure that they fit together. For example, consulting the elm/parser library, we find that

**andThen** : (a -> Parser b) -> Parser a -> Parser b

Thus `andThen`

gives (roughly speaking), a way to follow one parser with another, hence a way to sequence parsers where the second parser can receive information from the first. In more detail, using `andThen`

, one can combine a `Parser a`

with a function `a -> Parser b`

to produce value of type `Parser b`

. In the case above, the first parser (`Parser a`

), is `symbol "succ"`

and the second is `term`

. We use the function `\t -> term`

because no argument to `term`

is needed.

Finally, there is the parser for inputs like `if true then false else succ 0`

. It uses a parser pipeline which operates as follows: (a) recognize the symbol “if”, then a term; (b) eat spaces, then the symbol “then”, then a term; c) eat spaces, the symbol “else”, then a term. The effect of the parser `(|.)`

is to run the parser on its right, but to ignore the value parsed. The effect of the parser `(|=)`

is similar, except that the value parsed is kept. In the end, all the “kept” values are fed as arguments to the function `IfExpr`

to construct a value of type `Term`

.

`ifExpr : Parser Term`

ifExpr =

succeed IfExpr

|. symbol "if"

|= term

|. spaces

|. symbol "then"

|= term

|. spaces

|. symbol "else"

|= term

The use of `lazy`

as in . `lazy (_ -> ifExpr)`

is explained by the following remark (cf Czapliki):

In Elm, you can only define a value in terms of itself it is behind a function call. So

`lazy`

helps us define these self-referential parsers. (`andThen`

can be used for this as well!)

## An Evaluator

In this section, we discuss two functions, `eval : Term -> Value`

and `evalString : String -> Value`

. The second, which is the interpreter for `arith`

is (essentially) the composition `eval o parse`

, so the real work is in defining `eval`

.

To define `eval`

, we must first decide what is the type of the values it computes:

`type Value`

= Numeric Int

| Boolean Bool

| Error String

The definition accounts for the fact that there could be either numeric values, like `succ succ 0`

, otherwise known as `2`

, or the Boolean values `True`

or `False`

. It also accounts for the fact that evaluation may fail. For example, the string

`input = "if succ 0 then 0 else succ 0"`

can be parsed into a valid AST, namely

`ast = IfExpr (Succ Zero) Zero (Succ Zero)`

However, evaluation fails, because the first argument of an if-expression must be a Boolean:

`> eval ast`

Error ("If-then-else: expecting boolean value") : Value

Below is the definition of `eval`

. It depends on (and expresses) some reasonable (but here unwritten) semantics for the language `arith`

. The function `eval`

is a big `case`

statement, with one branch for each part of the definition of `Term`

.

eval : Term -> Value

eval t =

case t of

Zero ->

Numeric 0 Succ a ->

case eval a of

Numeric b ->

Numeric (b + 1) _ ->

Error "succ expects numeric value" Pred a ->

case eval a of

Numeric b ->

if b == 0 then

Numeric 0

else

Numeric (b - 1) _ ->

Error "pred expects numeric value" F ->

Boolean False T ->

Boolean True IsZero a ->

case eval a of

Numeric b ->

Boolean (b == 0) _ ->

Error "iszero expects a numeric value" IfExpr a b c ->

case eval a of

Boolean v ->

case v of

True ->

eval b False ->

eval c _ ->

Error "If-then-else: expecting boolean value"

## Comments

- The reason that the
`eval`

function can produce values of type`Error String`

is that the language`arith`

is not typed. Thus sentences such as “if succ iszero succ 0 then 0 else succ 0” can be compiled into a valid AST but cannot be evaluated without error. - Julian Antonielli has some very nice slides describing a type checker for
`arith`

. His implementation is in Haskell. I plan to give it a try in Elm. Among the references at the end of Julian’s slides is a link to a YouTube video on “typechecking from scratch.” - A good exercise is to rig up a function to compute the depth of the AST produced by
`parse`

. Another good exercise is to work out a definition of`stringValue`

. You will find code for both of these in the GitHub repo for the Elm implementation of`arith`

.

## Definition of depth

- The depth of a term like
`true`

,`false`

,`0`

is 1. - The depth of the term
`succ t`

is 1 + depth`t`

, and similarly for`pred`

and`iszero`

. - The depth of
`if a then b else c`

is 1 + max(depth a, depth b, depth c).