Let's write DBMS in Haskell: project management with Cabal
Last term has passed and I've got a lot of time to spend on writing master thesis and doing other stuff. I've decided that improving FP and Haskell skills is the thing I'd like to do and the best way to do this is to write a couple of small, but rather serious projects. We've had some interesting ones in university, but, to be honest, my architecture decisions were not so good sometimes and by the end of a term projects become surrounded with all kinds of workarounds and it was too late to rewrite them from scratch. Now I have no hard deadlines and requirements so why not write these projects properly?
Projects I'm talking about are database management system (like Sqlite) and virtual machine implementation with JIT compilation and some simple optimizations (like Hotspot). The former I wrote more than a year ago, the latter was written a couple of months ago. I'll start with the first one because it has been written in Haskell, so it will be easier to fix my previous mistakes. Another reason is that I'm starting to forget things about DBMS and it would be great to freshen them up.
I will store my code in git repository and keep the link to the state of project by the time of writing a post on top of it. Current specifications and requirements are here; in short: it will be a simple database supporting three fixed-size types (ints, doubles and varchars), b-tree indexes, cross joins and 'vacuum' operation. I will cover internal architecture more in latter posts; now I'm going to talk about project management with Cabal.
Cabal is recommended1 build tool for Haskell. Its name stays for
"Common Architecture for Building Applications and Libraries". It is split in
two parts: cabal-install package which contains cabal executable and Cabal
library which contains some advanced things for complex builds. Be sure to have
the former installed in order to proceed with the instructions below.
Project configuration
Creating new Cabal project is simple: create new directory, cd into it, run
$ cabal init
and answer some questions about your project. A couple of files will be
generated: <your-project>.cabal, a declarative definition of your project's
build and Setup.hs, a build script. Here is possible contents of generated
.cabal file:
-- Initial my-project.cabal generated by cabal init.  For further
-- documentation, see http://haskell.org/cabal/users-guide/
name:                my-project
version:             0.1.0.0
synopsis:            My precious project
-- description:
license:             MIT
license-file:        LICENSE
author:              Nikolay Obedin
maintainer:          dancingrobot84@gmail.com
-- copyright:
-- category:
build-type:          Simple
-- extra-source-files:
cabal-version:       >=1.10
executable my-project
  main-is:             Main.hs
  -- other-modules:
  -- other-extensions:
  build-depends:       base >=4.7 && <4.8
  -- hs-source-dirs:
-- ... other sections
Syntax is pretty straightforward: indentation is used to distinguish entries and
their contents, "--" to comment things out. Global properties are on top of
the file, they are pretty simple too: name of your project, version, short
description, license, category on Hackage, etc. One interesting field is
build-type which is used to tell Cabal what type of build is going to be used:
Simple is for builds configured via .cabal file, Custom is for more
complex builds using Setup.hs2.  I'll stick with Simple build type for
now and then.
A set of sections is placed after global properties. Each section should belong to one of four types: library, executable, test suite or benchmark. I'm going to take a look at first three of them.
Library section and common settings
library
  other-modules:    Foo.Bar
  exposed-modules:  Foo.Baz
  hs-source-dirs:   src
  default-extensions:
        CPP, ForeignFunctionInterface
  build-tools:      alex, happy
  build-depends:
        base
      , array
Library is a collection of Haskell modules that can be reused by other
libraries and applications. Project may contain at most one library, its name
matches the name of a project. Each module of a library should be either in
other-modules entry or exposed-modules entry based on its visibility to end
user -- the latter are visible and the former aren't. If you chose library in
cabal init then all your existent modules have been automatically added to
exposed-modules.
The rest of settings in this example are common to all types of sections.
hs-source-dirs is comma-separated list of directories where Cabal will search
for source files. default-extensions is list of language extensions enabled
by default for all source files.
build-tools is list of programs used to process specific files before
compiling them. Note that these executables for some reason ARE NOT installed
automatically3, you should do it manually. In this example alex is used to
process files with .x extension and happy processes files with .y
extension to generate lexer and parser respectively.
build-depends is list of packages your project depends on. Each package may
be constrained4, but be careful with it as inadequate constraints may lead
Cabal to inability of installing your dependencies.  Usually, I use
Stackage to constrain dependencies for me while
leaving them unconstrained in build-depends.  However, this approach is
useful only if you're developing internal library or application -- if you're
going to publish it on Hackage then you should set sane constraints to your
dependencies, but, I guess, by that time you will know these things better than
me:)
Executable and Test-Suite sections
executable my-program
  hs-source-dirs:   src
  build-depends:    base
  main-is:          Main.hs
Executable is a standalone program. Its name is not required to be the same
as package's one and project may have many executables. The only thing it requires
is to have an entry point which is main :: IO () function. A source file
having this function should be specified in main-is entry.
test-suite tests
  type:             exitcode-stdio-1.0
  main-is:          Main.hs
  hs-source-dirs:   test
  build-depends:
        base
      , hspec
Project also may contain many test suite sections and each of these sections
should use one of supported testing interfaces. The interface used by particular
test-suite is defined in its type field. Cabal supports two testing interfaces
out of the box: exitcode-stdio and detailed.  I prefer the first one because
it is simpler -- it just compiles and runs your test application checking its
exit code: if its non-zero then test has failed.  The only required field for
exitcode-stdio is main-is which means exactly the same thing as in
executable section - it is a source file of your test program.
Installing dependencies, building and running
Now that the project is configured it is time to build it. But first you need to
install the dependencies stated in build-depends fields. I strongly recommend
that you use Cabal sandboxes to avoid possible conflicts. To create new sandbox run:
$ cabal sandbox init
and then install dependencies:
$ cabal install --dependencies-only
If you're using build-tools install them manually:
$ cabal install alex happy <any-other-build-tool>
Now you can build your project:
$ cabal configure && cabal build
Then you can test it and run:
$ cabal test && cabal run
If you want to try something in ghci you can start it with all dependencies of
your project available by running:
$ cabal repl
These are the commands you're going to use rather frequently; the rest of
available commands and flags can be found by running cabal --help.
In the end
Haskell is a great language and Cabal is a sane build tool if it's used properly. Use sandboxes and careful version constraints, do not install many packages in global scope -- remember that Cabal is not a package manager and you'll be fine.