A Pure Elm Markdown Parser
Yes, another Markdown parser! This time, in pure Elm, the results of which you can see in this demo. The idea was to make a Markdown parser that could also handle mathematical text, as well as some convenient extensions — strike-through text, verbatim and poetry blocks, and tables. There may be a few more extensions to come, but I am trying to be somewhat conservative in this regard. The API is quite small, consisting essentially of the function
Elm.Markdown.toHtml : Option -> String -> Html msg
which one might apply like this:
Elm.Markdown.toHtml ExtendedMath "Pythagoras said $a^2 + b^2 = c^2$"
The fist argument is of typeOption = Standard | Extended | ExtendedMath
. The second argument, a string, can of course be as long as you want, e.g., a whole document.
See jxxcarlson/elm-markdown for the code and documentation.
The Parser
As usual, it is the parser that takes the most care to build. Fortunately, the combinators in the elm/parser library, combined with the general expressiveness of Elm are up to the job. The parser follows the strategy recommended by the CommonMark group — or rather I tried to follow this strategy as best I could. The idea is to first parse the text line by line into a tree of blocks — headings, paragraphs, list items, etc. The content of the blocks is unparsed at this point. My approach was to parse the text into a list of elements of the form (Level, Block)
where
type Block = Block BlockType Level Content
and where type alias Level = Int
. Think of the document as a kind of outline like this
Introduction
Biology
Plants
Flowering
Non-flowering
Animals
Furry
Non-furry
Chemistry
Organic
Inorganic
The level of an outline element is the number of leading spaces divided by three (integer division).
Consider now a list of things of the form (Level, Whatever)
, where the level of a thing after another one is either the given level, a lesser level, or the given level plus one. Call such a list annotated. Then outlines define annotated lists, and vice versa. Annotated lists also define a corresponding rose tree. These correspondences make it easy to transform a Markdown document into a tree of blocks. Mapping a parser for inline elements over the tree yields a tree for which it is easy to write a suitable rendering function.
Digging a little deeper
A few more words about parsing into blocks. For this I used a finite state machine defined by
type FSM = State (List Block) Register
The “real state” of the machine is
type State
= Start
| InBlock Block
| Error
The (List Block)
part accumulates the list of annotated blocks, while the Register
is used to manage information on section numbers and also a stack that is used for parsing tables. It is quite a flexible set up — easy to add to and to modify. One runs the machine using
runFSM : Option -> String -> FSM
runFSM option str =
let
folder : String -> FSM -> FSM
folder =
\line fsm -> nextState option line fsm
in
List.foldl folder initialFSM (splitIntoLines str)
Of course, the real work is the the construction of
nextState : Option -> String -> FSM -> FSM
From Annotated Lists to Rose Trees
To get a tree from the annotated list, one uses the jxxcarlson/htree library, which exposes
fromList : a -> (a -> Int) -> List a -> Tree a
The first argument is the root node label, the second is a function that maps node labels to integers (levels), and the last is the annotated list. This library relies on a zipper in the zwilias/elm-rosetree library.
Compliance with CommonMark, Plans
I’d like for the toHtml Standard
function to satisfy the CommonMark spec. It is definitely not there yet, albeit quite serviceable in its current form. This is a goal towards which I will work, given time and resources. I would also like to use MathJax 3 for rendering math, rather than the current 2.7.5 version. MathJax 3 is much faster than 2.7.3, and I am hoping that using it will eliminate the “flashing” that one sees when live editing a document that has math.
The current version of the library renders strings to Html msg
. I plan functions to render to String
representing (a) HTML, (b) LaTeX. The latter is (a) for the heck of it, (b) to provide a way to generate PDF output.
NOTE: no Javascript whatever needed for the Standard
and Extended
options. Of course, you will need it for ExtendedMath
.