This is the fourth part of my blog series on creating a fully functional Puppet stack with Docker. The previous post on installing Puppetmaster, Foreman and PuppetDB with Puppet can be found here.

Crane

Crane is described on Github as follows:

Crane is a tool to orchestrate Docker containers. It works by reading in some configuration (JSON or YAML) which describes how to obtain images and how to run containers. This simplifies setting up a development environment a lot as you don't have to bring up every container manually, remembering all the arguments you need to pass. By storing the configuration next to the data and the app(s) in a repository, you can easily share the whole environment.

If you've ever used a Vagrantfile then this must sound familiar. It allows us to specify our Docker infrastructure with all the necessary parameters in a single definition file.

Why not FIG?

I started out with FIG because it was recently adopted by the Docker team, but I quickly ran into some trouble using it. For some reason I could not use underscores in my Docker names which resulted in some really ugly camel-cased names. There were some other issues as well that drove me to look elsewhere. These issues might have been my own fault though and I would recommend that others take a look at FIG and find out for themselves.

Additional benefits of Crane that made me never look back to FIG are:

  • Written in GO instead of Python. It provides a single self-contained executable file that you have to place somwhere in your path. Installation could not be simpler.
  • Crane is aware of (cascading) dependencies between containers and will start them in the right order or let you know if a dependancy is missing

Container definitions

The following Crane definition file is included in the repository:

---
containers:

  puppetdb:
    image: iverberk/puppetdb:packer
    run:
      volumes-from: ["puppetdb_datastore"]
      publish: ["8080:8080"]
      expose: ["8081"]
      cmd: ["/usr/bin/supervisord", "-c", "/etc/supervisord.conf"]
      hostname: puppetdb.localdomain
      detach: true

  puppetdb_datastore:
    image: iverberk/puppetdb:packer
    run:
      volume: ["/var/lib/pgsql/data"]
      cmd: ["true"]
      detach: true

  foreman:
    image: iverberk/foreman:packer
    run:
      volumes-from: ["foreman_datastore"]
      publish: ["443:443"]
      cmd: ["/usr/bin/supervisord", "-c", "/etc/supervisord.conf"]
      hostname: foreman.localdomain
      detach: true

  foreman_datastore:
    image: iverberk/foreman:packer
    run:
      volume: ["/var/lib/pgsql/data"]
      cmd: ["true"]
      detach: true

  puppetmaster:
    image: iverberk/puppetmaster:packer
    run:
      volume: ["./environments:/etc/puppet/environments", "./hiera:/etc/puppet/hiera"]
      volumes-from: ["puppetmaster_datastore"]
      cmd: ["/usr/bin/supervisord", "-c", "/etc/supervisord.conf"]
      detach: true
      hostname: puppetmaster.localdomain
      expose: ["8443"]
      publish: ["8140:8140"]

  puppetmaster_datastore:
    image: iverberk/puppetmaster:packer
    run:
      volume: ["/var/lib/puppet/ssl"]
      cmd: ["true"]
      detach: true

Most of the options are self-explanatory and can be found on the excellent Docker run reference. In this blog post I would like to focus on two subjects: data persistence and running multiple processes inside a Docker container.

Container Data Persistence

Suppose we are running a PostgreSQL database container. When we interact with this container, data is stored and retrieved from disk. Because of the nature of Docker containers this data is written in a copy-on-write fashion to a new layer on top of the base image. Unless the data within this new layer is committed to an image it will not persist when the container is destroyed.

Usually you do not want to store this run-time data to an image as it will be different for every container that is created from the image. However, you probably do want to backup this data and make sure that data is restored when you destroy and recreate the container. Docker provides a mechanism for this called volumes. Volumes can be shared between containers and only exist as long as no containers are using them anymore. Volumes can be shared and are accessible even when containers that define them are not running.

A pattern has emerged within the Docker community that advocates the use of dedicated 'no-op' containers specifically for storing data. Their only purpose is to store data for other containers. This is exactly what you see in my Crane container file. There are dedicated container datastores that only store data for the running PuppetDB, Foreman and Puppetmaster containers. The other containers use the volumes-from setting to specify that they would like to use the volumes associated with the datastore containers. This way the data is persisted between recreation of containers. The volumes are created from the same base image so that they are pre-filled with the data already present in the image (check this wonderful blogpost for additional information on why it is necessary to create a datastore from the same image as the containers). They are run with the simplest of commands (true) and will exit immediately, never to be restarted again.

Multiple Container Processes

When I started with Docker the most obvious restriction (or opportunity!) was the fact that you could only launch a single process. The container would effectivly stop running when the process exited. This poses some challenges when your application depends on an additional process to be running within the same container. One way to deal with this is to split out every service to a seperate container. This is basically what a microservices architecture is all about. Sometimes though you want to be a little bit more pragmatic about it and run two or three processes within a container at the same time. I would say that three running processes is the absolute maximum before you should split them out to seperate containers. But how do we achieve this?

The easy way would be to include a little script that start one or two processes in the background and executes the last process (the main application) as a long-running foreground process. The downside to this approach is that you have no way to manage the processes when something bad happens or if you want to restart a proces. This is why it's often advisable to use a process-manager like supervisord to start and manage the processes for you. This is exactly what I have done with this project.

The supervisord configurations are put in the containers by the Puppet provisioning step. In the Crane definition you can see that all containers are started by running the supervisord process with the Puppet provisioned configuration file. For the Puppetmaster container this means that httpd and the Foreman smart-proxy are started. For the Foreman container the httpd and PostgreSQL database are started. And finally, for the PuppetDB container it means that the PostgreSQL database is started and that the PuppetDB Java runtime is started in the foreground.

Fire it up

So how do we start our fully functional Puppet infrastructure? With a simple command:

crane lift

This assumes that you've downloaded the Crane executable somewhere in your path. This command will check the remote Docker registry for updates to the images and pull them in. Next, it will start the containers in the right order as to satisfy the inter-dependencies (e.g. datastore containers first).


We now have running environment. But there is a problem! How do we locate and interact with the containers and how do the containers know where to find each other? This is the domain of service discovery and in the next post I will introduce my Docker-spy tool that solves this problem.