Let's write DBMS in Haskell: project management with Cabal

8 Mar 2015

Last term has passed and I've got a lot of time to spend on writing master thesis and doing other stuff. I've decided that improving FP and Haskell skills is the thing I'd like to do and the best way to do this is to write a couple of small, but rather serious projects. We've had some interesting ones in university, but, to be honest, my architecture decisions were not so good sometimes and by the end of a term projects become surrounded with all kinds of workarounds and it was too late to rewrite them from scratch. Now I have no hard deadlines and requirements so why not write these projects properly?

Projects I'm talking about are database management system (like Sqlite) and virtual machine implementation with JIT compilation and some simple optimizations (like Hotspot). The former I wrote more than a year ago, the latter was written a couple of months ago. I'll start with the first one because it has been written in Haskell, so it will be easier to fix my previous mistakes. Another reason is that I'm starting to forget things about DBMS and it would be great to freshen them up.

I will store my code in git repository and keep the link to the state of project by the time of writing a post on top of it. Current specifications and requirements are here; in short: it will be a simple database supporting three fixed-size types (ints, doubles and varchars), b-tree indexes, cross joins and 'vacuum' operation. I will cover internal architecture more in latter posts; now I'm going to talk about project management with Cabal.

Cabal is recommended1 build tool for Haskell. Its name stays for "Common Architecture for Building Applications and Libraries". It is split in two parts: cabal-install package which contains cabal executable and Cabal library which contains some advanced things for complex builds. Be sure to have the former installed in order to proceed with the instructions below.

Project configuration

Creating new Cabal project is simple: create new directory, cd into it, run

$ cabal init

and answer some questions about your project. A couple of files will be generated: <your-project>.cabal, a declarative definition of your project's build and Setup.hs, a build script. Here is possible contents of generated .cabal file:

-- Initial my-project.cabal generated by cabal init.  For further
-- documentation, see http://haskell.org/cabal/users-guide/

name:                my-project
version:             0.1.0.0
synopsis:            My precious project
-- description:
license:             MIT
license-file:        LICENSE
author:              Nikolay Obedin
maintainer:          dancingrobot84@gmail.com
-- copyright:
-- category:
build-type:          Simple
-- extra-source-files:
cabal-version:       >=1.10

executable my-project
  main-is:             Main.hs
  -- other-modules:
  -- other-extensions:
  build-depends:       base >=4.7 && <4.8
  -- hs-source-dirs:

-- ... other sections

Syntax is pretty straightforward: indentation is used to distinguish entries and their contents, "--" to comment things out. Global properties are on top of the file, they are pretty simple too: name of your project, version, short description, license, category on Hackage, etc. One interesting field is build-type which is used to tell Cabal what type of build is going to be used: Simple is for builds configured via .cabal file, Custom is for more complex builds using Setup.hs2. I'll stick with Simple build type for now and then.

A set of sections is placed after global properties. Each section should belong to one of four types: library, executable, test suite or benchmark. I'm going to take a look at first three of them.

Library section and common settings

library
  other-modules:    Foo.Bar
  exposed-modules:  Foo.Baz
  hs-source-dirs:   src
  default-extensions:
        CPP, ForeignFunctionInterface
  build-tools:      alex, happy
  build-depends:
        base
      , array

Library is a collection of Haskell modules that can be reused by other libraries and applications. Project may contain at most one library, its name matches the name of a project. Each module of a library should be either in other-modules entry or exposed-modules entry based on its visibility to end user -- the latter are visible and the former aren't. If you chose library in cabal init then all your existent modules have been automatically added to exposed-modules.

The rest of settings in this example are common to all types of sections. hs-source-dirs is comma-separated list of directories where Cabal will search for source files. default-extensions is list of language extensions enabled by default for all source files.

build-tools is list of programs used to process specific files before compiling them. Note that these executables for some reason ARE NOT installed automatically3, you should do it manually. In this example alex is used to process files with .x extension and happy processes files with .y extension to generate lexer and parser respectively.

build-depends is list of packages your project depends on. Each package may be constrained4, but be careful with it as inadequate constraints may lead Cabal to inability of installing your dependencies. Usually, I use Stackage to constrain dependencies for me while leaving them unconstrained in build-depends. However, this approach is useful only if you're developing internal library or application -- if you're going to publish it on Hackage then you should set sane constraints to your dependencies, but, I guess, by that time you will know these things better than me:)

Executable and Test-Suite sections

executable my-program
  hs-source-dirs:   src
  build-depends:    base
  main-is:          Main.hs

Executable is a standalone program. Its name is not required to be the same as package's one and project may have many executables. The only thing it requires is to have an entry point which is main :: IO () function. A source file having this function should be specified in main-is entry.

test-suite tests
  type:             exitcode-stdio-1.0
  main-is:          Main.hs
  hs-source-dirs:   test
  build-depends:
        base
      , hspec

Project also may contain many test suite sections and each of these sections should use one of supported testing interfaces. The interface used by particular test-suite is defined in its type field. Cabal supports two testing interfaces out of the box: exitcode-stdio and detailed. I prefer the first one because it is simpler -- it just compiles and runs your test application checking its exit code: if its non-zero then test has failed. The only required field for exitcode-stdio is main-is which means exactly the same thing as in executable section - it is a source file of your test program.

Installing dependencies, building and running

Now that the project is configured it is time to build it. But first you need to install the dependencies stated in build-depends fields. I strongly recommend that you use Cabal sandboxes to avoid possible conflicts. To create new sandbox run:

$ cabal sandbox init

and then install dependencies:

$ cabal install --dependencies-only

If you're using build-tools install them manually:

$ cabal install alex happy <any-other-build-tool>

Now you can build your project:

$ cabal configure && cabal build

Then you can test it and run:

$ cabal test && cabal run

If you want to try something in ghci you can start it with all dependencies of your project available by running:

$ cabal repl

These are the commands you're going to use rather frequently; the rest of available commands and flags can be found by running cabal --help.

In the end

Haskell is a great language and Cabal is a sane build tool if it's used properly. Use sandboxes and careful version constraints, do not install many packages in global scope -- remember that Cabal is not a package manager and you'll be fine.