Building Trees with Elm
This post is about trees — the computer science kind, as illustrated in Figure 1 below. Our goal is to see how trees can be derived from text and, conversely, how trees can be turned back into text. Along the way, we will see how to display trees graphically, as below.
Let’s establish some terminology. In the Figure, we see blue dots — nodes—labeled by integers and some more text. The node labeled 1 is the root of the tree. It has two children, the subtrees with roots labeled 3 and 4. We say that the node labeled 1 is the parent of the nodes labeled 3 and 4. Etc. Our tree is a rose tree because a node can have any number of children. Nodes without children are called leaf nodes. If all nodes have either zero or two children, the tree is called binary tree .
A tree can be represented as outline, where indentation expresses the parent-child relationship. Here is the outline corresponding to the tree of our figure:
1 Home
2 Important things (according to Cicero)
3 Library
4 Garden
5 Pond
6 Grasses
7 Flowers Figure 2
In addition to outlines for essays, as above, trees can be used to represent genealogies, the “tree of life,” which expresses the relationship among species both living and extinct, the file system of a computer, etc.
Representing Trees in Elm
We will use the package zwilias/elm-rosetree for our work. Consider a tree whose nodes are of type a
. The root of such a tree has a label of type a
and has zero, one, two, or many children. We express the node + children nature of trees like this:
type Tree a = Tree a (List (Tree a))
By way of example, the tree pictured in the Figure 3 below is given by
Tree 1 [Tree 2 [Tree 3 [],Tree 4 []]]
The root node is labeled by the integer 1; the list of children of this node has just one element, the tree Tree 2 [Tree 3 [], Tree []]
. Its root is labeled 2, and its children are given by two-element list with elements Tree 3 []
and Tree 4 []
. The latter two nodes are leaves; they have no children.
What we would like is a function
Build.fromString : node -> (String -> node) -> String -> Result Error (Tree node)
forestFromString defaultNode makeNode renderNode str = ...
that operates like this:
> Build.fromString "?" identity "1\n 2\n 3\n 4"
Ok (Tree "1" [Tree "2" [Tree "3" [],Tree "4" []]])
You can find the code for these on GitHub; the code is also published in jxxcarlson/elm-tree-builder as an Elm package.
Blocks
To build a tree from an outline, we first transform it into list of blocks, where
type alias Block = { indent : Int, content: String }
While content
is given by a single line, here by a single line, one can also have multi-line blocks. Here we have
blocks = [ {indent = 0, content = "1"}
, {indent = 1, content = "2" }
, {indent = 1, content = "3" }
, {indent = 2 , content = "4"}
]
Zippers
To transform the list of blocks into a tree, we use the notion of a zipper, which is a tree with a focus: the subtree which we currently interested in. In module Tree.Zipper of zwilias/elm-rose-tree you will find functions for moving the focus, e.g., parent and lastChild. Using Tree
and Tree.Zipper
, one can rig up a function for attaching a new tree at the focus of a given zipper:
attachAtFocus : Tree a -> Zipper a -> Zipper a
The function call attachAtFocus t z
replaces the subtree of z
defined by the focus with the tree t
. This function, combined with the moves just mentioned, provide the ingredients for the fromBlocks
function.
Now to the heart of the matter. Our goal is to grow the tree beginning with the root, then attach nodes in preorder — 1, 2, 3, … as shown below. Each time we move to a lower level of the tree, we move the focus of the zipper using the lastChild
function. Moving down corresponds to an increase in indentation level. No move of focus is needed when we stay at the same level, so no move is needed after attaching node 4 to the tree before we attach node 5. But for node 6, we move up a level, corresponding to a decrease in indentation level. For this we apply the parent
function. It can happen that one has to move the focus back several levels.
You can find the code for building the tree right here. We use the kind of functional loop described in this article.
Rendering and Testing
In module Tree.Render one finds the function
toString : Int -> (a -> String) -> Tree a -> String
toString quantum renderNode tree =
where renderNode
tells how to convert a label of type a
to a string. If a
is of type String, the identity function will do. The toString
function is useful for testing. Take an outline and use it to build a tree. Convert the tree back into a string. You should end up with the original string. We call this an identity test, since the composite of fromString
and toString quantum renderNode
is the identity function.
To take this a step further, in module Tree.Random
, there are functions
generateOutline : Int -> Int -> String
generateOutline maxEntries seed = ...
and
generate : Int -> Int -> Result Error (Tree String)
generate maxNodes seed = ...
which can be used to generate random outlines and random trees. Thus one can run “identity tests” as described above on large numbers of random outlines and trees, e.g., 18.5 seconds for tests of trees with 1000 nodes.