This post is about trees — the computer science kind, as illustrated in Figure 1 below. Our goal is to see how trees can be derived from text and, conversely, how trees can be turned back into text. Along the way, we will see how to display trees graphically, as below.

Let’s establish some terminology. In the Figure, we see blue dots — nodes—labeled by integers and some more text. The node labeled 1 is the root of the tree. It has two children, the subtrees with roots labeled 3 and 4. We say that the node labeled 1 is the parent of the nodes labeled 3 and 4. Etc. Our tree is a rose tree because a node can have any number of children. Nodes without children are called leaf nodes. If all nodes have either zero or two children, the tree is called binary tree .

A tree can be represented as outline, where indentation expresses the parent-child relationship. Here is the outline corresponding to the tree of our figure:

1 Home
2 Important things (according to Cicero)
3 Library
4 Garden
5 Pond
6 Grasses
7 Flowers

In addition to outlines for essays, as above, trees can be used to represent genealogies, the “tree of life,” which expresses the relationship among species both living and extinct, the file system of a computer, etc.

Representing Trees in Elm

We will use the package zwilias/elm-rosetree for our work. Consider a tree whose nodes are of type a. The root of such a tree has a label of type a and has zero, one, two, or many children. We express the node + children nature of trees like this:

type Tree a = Tree a (List (Tree a))

By way of example, the tree pictured in the Figure 3 below is given by

Tree 1 [Tree 2 [Tree 3 [],Tree 4 []]]

The root node is labeled by the integer 1; the list of children of this node has just one element, the tree Tree 2 [Tree 3 [], Tree []]. Its root is labeled 2, and its children are given by two-element list with elements Tree 3 [] and Tree 4 []. The latter two nodes are leaves; they have no children.

Figure 3

What we would like is a function

Build.fromString : node -> (String -> node) -> String -> Result Error (Tree node)
forestFromString defaultNode makeNode renderNode str = ...

that operates like this:

> Build.fromString "?" identity "1\n 2\n  3\n  4"
Ok (Tree "1" [Tree "2" [Tree "3" [],Tree "4" []]])

You can find the code for these on GitHub; the code is also published in jxxcarlson/elm-tree-builder as an Elm package.

Blocks

To build a tree from an outline, we first transform it into list of blocks, where

type alias Block = { indent : Int, content: String }

While content is given by a single line, here by a single line, one can also have multi-line blocks. Here we have

blocks = [  {indent = 0, content = "1"}
, {indent = 1, content = "2" }
, {indent = 1, content = "3" }
, {indent = 2 , content = "4"}
]

Zippers

To transform the list of blocks into a tree, we use the notion of a zipper, which is a tree with a focus: the subtree which we currently interested in. In module Tree.Zipper of zwilias/elm-rose-tree you will find functions for moving the focus, e.g., parent and lastChild. Using Tree and Tree.Zipper, one can rig up a function for attaching a new tree at the focus of a given zipper:

attachAtFocus : Tree a -> Zipper a -> Zipper a

The function call attachAtFocus t z replaces the subtree of z defined by the focus with the tree t. This function, combined with the moves just mentioned, provide the ingredients for the fromBlocksfunction.

Now to the heart of the matter. Our goal is to grow the tree beginning with the root, then attach nodes in preorder — 1, 2, 3, … as shown below. Each time we move to a lower level of the tree, we move the focus of the zipper using the lastChild function. Moving down corresponds to an increase in indentation level. No move of focus is needed when we stay at the same level, so no move is needed after attaching node 4 to the tree before we attach node 5. But for node 6, we move up a level, corresponding to a decrease in indentation level. For this we apply the parent function. It can happen that one has to move the focus back several levels.

You can find the code for building the tree right here. We use the kind of functional loop described in this article.

Rendering and Testing

In module Tree.Render one finds the function

toString : Int -> (a -> String) -> Tree a -> String
toString quantum renderNode tree =

where renderNode tells how to convert a label of type a to a string. If a is of type String, the identity function will do. The toString function is useful for testing. Take an outline and use it to build a tree. Convert the tree back into a string. You should end up with the original string. We call this an identity test, since the composite of fromStringand toString quantum renderNode is the identity function.

To take this a step further, in module Tree.Random, there are functions

generateOutline : Int -> Int -> String
generateOutline maxEntries seed = ...

and

generate : Int -> Int -> Result Error (Tree String)
generate maxNodes seed = ...

which can be used to generate random outlines and random trees. Thus one can run “identity tests” as described above on large numbers of random outlines and trees, e.g., 18.5 seconds for tests of trees with 1000 nodes.

jxxcarlson on elm slack, http://jxxcarlson.github.io