Tarring files with Elm
[Updated June 4, 2019]
On November 14, 2018, Evan Czaplicki released two new Elm packages, Bytes (elm/bytes), and File (elm/file). Both have a simple, elegant interface and will open many new possibilities for Elm developers. One such is jxxcarlson/elm-tar, which is the subject of this post. The API exposes one module, Tar
, with functions createArchive
and extractArchive.
They do exactly what their names suggest. Data can be either binary or text, and such data can be transferred between Elm and the outside worlds using the elm/file package. This functionality was needed for exporting LaTeX files and image files in the MiniLatex app hosted on knode.io. But of course there will be many other, less niche uses for the new packages.
Creating a tar archive
To create a tar archive, one uses the function
createArchive : List ( MetaData, Data ) -> Bytes
as in this example:
-- Using module Tar and Filemetadata1 = { defaultMetadata | filename = "one.txt" }
metadata2 = { defaultMetadata | filename = "two.txt" }data1 = ( metadata1, StringData "One" )
data2 = ( metadata2, StringData "Two" )bytes = createArchive [data1, data2]File.Download.bytes "myArchive.tar" "application/x-tar" bytes
The API exposes two types, Data
and MetaData
, the first of which is used to discriminate between text and binary data:
type Data = StringData String | BinaryData Bytes
The second is a record of information required by the spec for tar:
type alias MetaData ={ filename : String
, mode : Mode
, ownerID : Int
, groupID : Int
, fileSize : Int
, lastModificationTime : Int
, linkIndicator : Link
, linkedFileName : String
, userName : String
, groupName : String
, fileNamePrefix : String
}
As a shortcut, one can rig up a MetaData
value using defaultMetadata
, modifying whatever fields one wants: meta = {defaultMetadata |filename = “foo.txt"}
.
Extracting data from a tar archive
Imagine that you have received a tar archive as a Bytes
value using HTTP or or File.toBytes. The data can be extracted from the Bytes value using:
extractArchive : Bytes -> List ( MetaData, Data )
Archiving arbitrary data
Now suppose we want to tar both text and binary data. To make a silly example, first use the function Hex.toBytes
from the package jxxcarlson/hex:
content1 =
Hex.toBytes
"B0C1D2E4F4"
|> Maybe.withDefault (encode (Bytes.Encode.unsignedInt8 0))
Imagine that we have constructed a companion metadata1
values as we did previously, and also imagine that we have some string data in content2
and content3
, with companion metadata values for each of these . We can tar all this content as follows:
tarArchive = Tar.encodeFiles
[ ( metaData1, BinaryData content1 )
, ( metaData2, StringData content2 )
, ( metaData3, StringData content3 )
]
|> encodeFile.Download.bytes "tarArchive.tar" "application/x-tar" tarArchive
A Demo App
For a demo app, see the source code for the tar package. There you will find /examples/Main.elm
which you can compile using elm make
to create an app residing in index.html
. Click onindex.html
to run the app. Behind the scenes, it loads two images from given URLs, creates bytes values for them, then downloads a tar archive with the two (uncompressed) images. You should be able to click on the downloaded archive, test.tar
, or use tar xvf test.tar
to untar the files. Pop quiz: what are the images?
The Development Process
I used the description of the tar file format on Wikipedia to write the encoders. Each file is encoded as a 512 byte file record with information such as filename, permissions, last modification date, etc. There is also a 12-byte checksum field, which is computed by adding the bytes of the file record, where the initial checksum is a sequence of twelve blanks (ASCII encoded). The twelve blanks are replaced by the checksum. The file record is followed by the data, which must be padded with nulls so that the padded data consists of a multiple of 512 bytes. Call the header plus the padded data a tarred file. A tar archive consists of a sequence of tarred files placed end-to-end, followed by two 512-byte blocks of nulls.
Here is the encoder for text strings:
encodeTextFile : MetaData -> String -> Encode.Encoder encodeTextFile metadata contents =
let
fileRecord = { metadata | fileSize = String.length contents }
in
Encode.sequence [ [
encodeFileRecord metadata
, Encode.string (padContents contents)
]
The Encode.sequence
function is used to pack bytes end-to-end. It is used repeatedly in the definition of encodeFileRecord
to build up the required sequence of bytes.
It wasn’t easy (for me) to get the encoder to work — it is an all-or-nothing matter. To help, I wrote another package, jxxcarlson/hex,
to create Bytes
values and to convert Bytes
values to strings of hexadecimal digits so that I could look at them. Here is an example:
$ elm repl
> import Hex exposing(..)
> import Bytes.Encode as Encode exposing(encode)
> encode (Encode.string "Hello") |> Hex.fromBytes
"48656C6C6F" : String> Hex.toBytes "FF66" |> Maybe.map Hex.fromBytes
Just "FF66" : Maybe String
Although this helped in the initial stages, the tar archives created were still invalid. I eventually had to resort to experimental science, making a tar archive as described above, downloading it, examining it with a hex editor, and comparing it, again, with an archive created with tar cvf
. That way I could spot the differences between a valid tar archive and the one I made with Elm. After some detective work, which included generous use of pencil and paper, I was able to resolve the differences to create a valid archive using pure Elm.