Tarring files with Elm

4 min readNov 29, 2018

[Updated June 4, 2019]

On November 14, 2018, Evan Czaplicki released two new Elm packages, Bytes (elm/bytes), and File (elm/file). Both have a simple, elegant interface and will open many new possibilities for Elm developers. One such is jxxcarlson/elm-tar, which is the subject of this post. The API exposes one module, Tar, with functions createArchive and extractArchive. They do exactly what their names suggest. Data can be either binary or text, and such data can be transferred between Elm and the outside worlds using the elm/file package. This functionality was needed for exporting LaTeX files and image files in the MiniLatex app hosted on knode.io. But of course there will be many other, less niche uses for the new packages.

Creating a tar archive

To create a tar archive, one uses the function

createArchive : List ( MetaData, Data ) -> Bytes

as in this example:

-- Using module Tar and Filemetadata1 = { defaultMetadata | filename = "one.txt" }
metadata2 = { defaultMetadata | filename = "two.txt" }data1 = ( metadata1, StringData "One" ) 
data2 = ( metadata2, StringData "Two" )bytes = createArchive [data1, data2]File.Download.bytes "myArchive.tar" "application/x-tar" bytes

The API exposes two types, Data and MetaData, the first of which is used to discriminate between text and binary data:

type Data = StringData String | BinaryData Bytes

The second is a record of information required by the spec for tar:

type alias MetaData ={  filename : String
 , mode : Mode
 , ownerID : Int
 , groupID : Int
 , fileSize : Int
 , lastModificationTime : Int
 , linkIndicator : Link
 , linkedFileName : String
 , userName : String
 , groupName : String
 , fileNamePrefix : String
}

As a shortcut, one can rig up a MetaData value using defaultMetadata, modifying whatever fields one wants: meta = {defaultMetadata |filename = “foo.txt"}.

Extracting data from a tar archive

Imagine that you have received a tar archive as a Bytes value using HTTP or or File.toBytes. The data can be extracted from the Bytes value using:

extractArchive : Bytes -> List ( MetaData, Data )

Archiving arbitrary data

Now suppose we want to tar both text and binary data. To make a silly example, first use the function Hex.toBytes from the package jxxcarlson/hex:

content1 =
      Hex.toBytes 
        "B0C1D2E4F4"   
        |> Maybe.withDefault (encode (Bytes.Encode.unsignedInt8 0))

Imagine that we have constructed a companion metadata1 values as we did previously, and also imagine that we have some string data in content2 and content3, with companion metadata values for each of these . We can tar all this content as follows:

tarArchive = Tar.encodeFiles
      [   ( metaData1, BinaryData content1 )
        , ( metaData2, StringData content2 )
        , ( metaData3, StringData content3 )
      ]
      |> encodeFile.Download.bytes "tarArchive.tar" "application/x-tar" tarArchive

A Demo App

For a demo app, see the source code for the tar package. There you will find /examples/Main.elm which you can compile using elm make to create an app residing in index.html. Click onindex.htmlto run the app. Behind the scenes, it loads two images from given URLs, creates bytes values for them, then downloads a tar archive with the two (uncompressed) images. You should be able to click on the downloaded archive, test.tar, or use tar xvf test.tar to untar the files. Pop quiz: what are the images?

The Development Process

I used the description of the tar file format on Wikipedia to write the encoders. Each file is encoded as a 512 byte file record with information such as filename, permissions, last modification date, etc. There is also a 12-byte checksum field, which is computed by adding the bytes of the file record, where the initial checksum is a sequence of twelve blanks (ASCII encoded). The twelve blanks are replaced by the checksum. The file record is followed by the data, which must be padded with nulls so that the padded data consists of a multiple of 512 bytes. Call the header plus the padded data a tarred file. A tar archive consists of a sequence of tarred files placed end-to-end, followed by two 512-byte blocks of nulls.

Here is the encoder for text strings:

encodeTextFile : MetaData -> String -> Encode.Encoder                       encodeTextFile metadata contents = 
  let                               
    fileRecord = { metadata | fileSize = String.length contents }                              
  in                               
    Encode.sequence [                                 [       
         encodeFileRecord metadata                                      
       , Encode.string (padContents contents)                                        
     ]

The Encode.sequence function is used to pack bytes end-to-end. It is used repeatedly in the definition of encodeFileRecord to build up the required sequence of bytes.

It wasn’t easy (for me) to get the encoder to work — it is an all-or-nothing matter. To help, I wrote another package, jxxcarlson/hex, to create Bytes values and to convert Bytes values to strings of hexadecimal digits so that I could look at them. Here is an example:

$ elm repl
> import Hex exposing(..)
> import Bytes.Encode as Encode exposing(encode)

> encode (Encode.string "Hello") |> Hex.fromBytes
"48656C6C6F" : String> Hex.toBytes "FF66" |> Maybe.map Hex.fromBytes
Just "FF66" : Maybe String

Although this helped in the initial stages, the tar archives created were still invalid. I eventually had to resort to experimental science, making a tar archive as described above, downloading it, examining it with a hex editor, and comparing it, again, with an archive created with tar cvf. That way I could spot the differences between a valid tar archive and the one I made with Elm. After some detective work, which included generous use of pencil and paper, I was able to resolve the differences to create a valid archive using pure Elm.