# Implementing the Mini-Language Arith in Elm

In my quest to better understand type theory, I’ve started reading Benjamin Pierce’s excellent book, Types and Programing Languages. Early on he introduces a little language — `arith`. While it is an untyped language, it is a superb vehicle for illustrating many fundamental notions and tools, e.g., structural induction, abstract syntax trees, etc. What I would like to describe here is a parser and interpreter for this language written in Elm. Writing code for these is a little exercise that helps to understand better some of the ideas in the book. Pierce gives a link to parser and interpreter code in OCaml. But of course one learns more if one does it oneself. The code for the Elm implementation is on .

Note (May 6, 2020): I’ve added an article on `Arith`.

## The Grammar of Arith

The language has a very simple grammar sufficient to express some of the structure of the natural numbers. Simple foundations for the natural numbers were formulated by . A key feature of his formulation was the successor function succ: given a natural number n, the next natural number is succ n.

To describe the grammar of `arith`, we give rules for forming its terms: (1) the symbols `true`, `false`, and 0 are terms; (2) if `t` is a term, then so are `succ t`, `pred t`, and `iszero `t; (3) If `s` `t` and `u` are terms, then so is `if t then s else u`. Here is an example of a “sentence” in `arith`, as a human would read it or write it:

`   S = if iszero succ 0 then succ succ succ 0 else false`

The real structure of the sentence S is given by the tree T pictured in the figure on the left. As we shall see shortly, one can construct a function `parse` that maps sentences `s` to one of two possible values. The first, in the case where the sentence "makes sense,” is`Ok t`, where `t` is a representation of the tree corresponding to `s`. The second value, in the case where `parse` does not understand `s`, is `Err str`, where `str` is a string representing an error message. There is also a function `stringValue`, which maps trees to strings. These two functions are, roughly speaking inverses of each other. Therefore the string and tree representations of a term of `arith` are equivalent. Note, by the way, that there is no need for parentheses in the s-representation. The grammar is sufficient to give it an unambiguous meaning.

The tree created by `parse` is called an abstract syntax tree, or AST, for short. Jumping ahead a bit, let’s test the functions `parse` and `stringValue` using the Elm repl. We start with a string `input`, apply `parse` to it to produce an AST, then apply `stringValue` to the AST to recover the string `input` :

`> input = "if iszero succ 0 then succ succ succ 0 else false"> parse input  Ok (IfExpr (IsZero (Succ Zero)) (Succ (Succ (Succ Zero))) F)  : Result (List Parser.DeadEnd) Term> ast = IfExpr (IsZero (Succ Zero)) (Succ (Succ (Succ Zero))) F> stringValue ast"if iszero succ 0 then succ succ succ 0 else False" : String`

If the input is nonsense, then the parser returns an error value:

`> parse "if 0 then pred else succ"  Err [...]  : Result (List Parser.DeadEnd) Term`

## The AST

We said that `ast` is the abstract syntax tree corresponding to the input sentence. We picture a representation of this tree on the left; it is a two-dimensional structure. But what we see in the terminal, is a “linear,” one-dimensional structure, with one word after another, albeit including parentheses. Both are representations of an Elm value. Every Elm value has a type, so we must formulate a definition of the type of AST’s, or what is the same thing, of terms. This is what we do next.

## The Type of a Term

Define a custom type (aka algebraic data type):

`type Term    = T    | F    | Zero    | Succ Term    | Pred Term    | IsZero Term    | IfExpr Term Term Term`

Notice that definition of `Term` is recursive, since it refers to itself. Note also the very close, essentially one-to-one correspondence between the definition of `Term` and the rules of the grammar.

## A Parser

Our next task is to construct a parser—a function `parse`that will, if successful, map strings to values of type `Term`:

`parse : String -> Result (List Parser.DeadEnd) Termparse str = Parser.run term str`

Note the type for the function value, `Result error value`. It enables us to write functions which may succeed or fail, which in the case at hand means parse successfully or fail to parse. The function `Parser.run` (from the ) “runs” a `Parser` on its input. Thus the real task is to understand the parser `term`, defined below. (We will explain things here modulo more detail to be found in th and in the code on ).

`term : Parser Termterm =    succeed identity        |. spaces        |= oneOf            [ true            , false            , zero            , succ            , pred            , iszero            , lazy (\_ -> ifExpr)            ]`

The parser `term` first “eats” any spaces it encounters, then uses the `oneOf` parser to try a list of parsers, each one in turn. If it succeeds with a component parser, it stops and returns the value parsed. If it succeeds with no component parser it returns an error value. Let us examine the component parsers. Those like the `true` parser are easy to understand:

`true : Parser Termtrue =    succeed T        |. symbol "true"`

If the string “true” can be eaten by the parser, the value `T : Term` is returned. Parsers like `succ` are slightly more complicated:

`succ : Parser Termsucc =    symbol "succ"        |> andThen (\_ -> term)        |> map (\t -> Succ t)`

The idea is that this parser first tries to eat the string “succ”. If successful, the parser then runs the parser`term` to recognize the term`t` in `suc t`. If that operation is successful, the term recognized is wrapped with the `Term` `Succ`by applying the function `\t -> Succ t`. To really understand what is going on here, one needs to work out all of the types, making sure that they fit together. For example, consulting the , we find that

`andThen : (a -> Parser b) -> Parser a -> Parser b`

Thus `andThen` gives (roughly speaking), a way to follow one parser with another, hence a way to sequence parsers where the second parser can receive information from the first. In more detail, using `andThen`, one can combine a `Parser a` with a function `a -> Parser b` to produce value of type `Parser b`. In the case above, the first parser (`Parser a`), is `symbol "succ"` and the second is `term` . We use the function `\t -> term` because no argument to `term` is needed.

Finally, there is the parser for inputs like `if true then false else succ 0`. It uses a parser pipeline which operates as follows: (a) recognize the symbol “if”, then a term; (b) eat spaces, then the symbol “then”, then a term; c) eat spaces, the symbol “else”, then a term. The effect of the parser `(|.)` is to run the parser on its right, but to ignore the value parsed. The effect of the parser `(|=)` is similar, except that the value parsed is kept. In the end, all the “kept” values are fed as arguments to the function `IfExpr` to construct a value of type `Term`.

`ifExpr : Parser TermifExpr =    succeed IfExpr        |. symbol "if"        |= term        |. spaces        |. symbol "then"        |= term        |. spaces        |. symbol "else"        |= term`

The use of `lazy` as in . `lazy (_ -> ifExpr)` is explained by the following remark (cf i):

In Elm, you can only define a value in terms of itself it is behind a function call. So `lazy` helps us define these self-referential parsers. (`andThen` can be used for this as well!)

## An Evaluator

In this section, we discuss two functions, `eval : Term -> Value` and `evalString : String -> Value`. The second, which is the interpreter for `arith` is (essentially) the composition `eval o parse`, so the real work is in defining `eval`.

To define `eval`, we must first decide what is the type of the values it computes:

`type Value    = Numeric Int    | Boolean Bool    | Error String`

The definition accounts for the fact that there could be either numeric values, like `succ succ 0`, otherwise known as `2`, or the Boolean values `True` or `False`. It also accounts for the fact that evaluation may fail. For example, the string

`input = "if succ 0 then 0 else succ 0"`

can be parsed into a valid AST, namely

`ast = IfExpr (Succ Zero) Zero (Succ Zero)`

However, evaluation fails, because the first argument of an if-expression must be a Boolean:

`> eval astError ("If-then-else: expecting boolean value") : Value`

Below is the definition of `eval`. It depends on (and expresses) some reasonable (but here unwritten) semantics for the language `arith`. The function `eval`is a big `case` statement, with one branch for each part of the definition of `Term`.

`eval : Term -> Valueeval t =    case t of        Zero ->            Numeric 0        Succ a ->            case eval a of                Numeric b ->                    Numeric (b + 1)                _ ->                    Error "succ expects numeric value"        Pred a ->            case eval a of                Numeric b ->                    if b == 0 then                       Numeric 0                     else                       Numeric (b - 1)                _ ->                    Error "pred expects numeric value"        F ->            Boolean False        T ->            Boolean True        IsZero a ->            case eval a of                Numeric b ->                    Boolean (b == 0)                _ ->                    Error "iszero expects a numeric value"       IfExpr a b c ->            case eval a of                Boolean v ->                    case v of                        True ->                            eval b                        False ->                            eval c                _ ->                     Error "If-then-else: expecting boolean value"`

1. The reason that the `eval` function can produce values of type `Error String` is that the language `arith` is not typed. Thus sentences such as “if succ iszero succ 0 then 0 else succ 0” can be compiled into a valid AST but cannot be evaluated without error.
2. Julian Antonielli has some very nice for `arith`. His implementation is in Haskell. I plan to give it a try in Elm. Among the references at the end of Julian’s slides is a link to a YouTube video on “.”
3. A good exercise is to rig up a function to compute the depth of the AST produced by `parse`. Another good exercise is to work out a definition of `stringValue`. You will find code for both of these in the for the Elm implementation of `arith`.

## Definition of depth

1. The depth of a term like `true`, `false`, `0` is 1.
2. The depth of the term `succ t` is 1 + depth `t`, and similarly for `pred` and `iszero`.
3. The depth of `if a then b else c` is 1 + max(depth a, depth b, depth c).

## More from James Carlson

jxxcarlson on elm slack, http://jxxcarlson.github.io