Logo { blog }

innovation through collaboration

DockerCon Hackathon

Recently, @schvin and I participated in Docker’s first 24-hour hackathon ahead of the first DockerCon. We were joined by 98 other tech junkies at the Docker offices, each teaming up to build their own awesome projects in the hopes of winning the highly coveted DockerCon tickets as well as a speaking slot at the conference. @schvin and I set off with an idea that spawned at the Monitorama conference we attended in May.

After seeing a presentation on logging by @torkelo around Grafana – which gives you all of the benefits one might get from Logstash and Kibana, but lets you use Graphite and StatsD to provide a more timeseries-centric view of your logs – we decided that it would be cool if we could automatically ship logs from a Docker host, and the containers running on that Docker host, straight into Grafana without having to run extra services on the containers or enforce global changes on legacy containers.

Here is how we built it:

5:15 pm - We started the project at the kickoff of the hackathon with an initial commit to Dockerana.

Initial commit to Dockerana

For our idea to work, we had to overcome a number of hurdles related to three core aspects of Docker’s unique environment. First, we needed to ensure we could extract logs from both the host and its containers without having to modify the containers themselves. Second, in order to use Grafana, we had to incorporate several dependencies all within a Docker environment. Finally, we needed to come up with a standard format to wrap the Docker logs in so that the logs could be fed to Graphite and displayed with Grafana.

8:45 pm - Three and a half hours later, we had our initial working prototype: a single Docker container running Grafana and all of its dependencies, gleaning data from the Docker host, and shoving it into Graphite so that it was viewable via Grafana.

While it was a good start, there was still a lot of work to be done. The following three tasks required significantly more thought and tinkering:

  • Getting log data from the containers themselves, and not just the host;
  • Breaking out the monolithic Grafana container into a series of integrated components, each in their own containers; and
  • Bringing it all together by automating the process of spinning up Grafana with its dependencies as well as collecting and aggregating the logs.
  • For our Grafana setup, we required six processes to be running, which in Dockerland translates to six containers that all know how to properly communicate and share data with each other. Below are the six services, which together work as a cohesive Grafana system:

    • Carbon
    • Elasticsearch
    • Grafana
    • Graphite
    • Nginx
    • StatsD
    Graphite is comprised of two components, an engine, and a backend. The backend is Carbon which gets fed the logs via StatsD. StatsD is a network daemon that listens for statistics, counters, timers, and events sent over UDP and aggregates it to pluggable backend services, such as Carbon. Graphite, the engine, is being served up through Nginx, an HTTP and reverse proxy server, making it available to Grafana. Elasticsearch allows different Grafana dashboards to be saved and loaded.

    Here’s a roughly drawn diagram of how all of these technologies are wired up for Dockerana:

    Here in Dockerland, we can see how to spin up some of the necessary containers and how they are interconnected both through data and communication.

    
    # spin up carbon container with a volume
    docker run -d \
               -p 2004:2004 \
               -p 7002:7002 \
               -v /opt/graphite \
               --name dockerana-carbon dockerana/carbon
    
    # spin up a graphite container and connect the volume from carbon to it
    docker run -d \
               --volumes-from dockerana-carbon \
               --name dockerana-graphite dockerana/graphite
    
    # spin up an nginx container and link the networking exposed in graphite to it
    docker run -d \
               -p 8080:80 \
               --link dockerana-graphite:dockerana-graphite-link \
               --name dockerana-nginx dockerana/nginx
    
    

    An astute eye might notice that when the Graphite container is spun up there does not appear to be any exposed networking specified for the Nginx container to link to. That is because we are not exposing any networking to the outside. Instead, we are using native Docker linking between containers through the Dockerfile. As you can see in the example below, the EXPOSE command allows those ports (in this case, 8000) to communicate between linked containers without being exposed to the outside world.

    
    FROM ubuntu:trusty
    MAINTAINER Charlie Lewis 
    
    RUN apt-get -y update
    RUN apt-get -y install git \
                           python-django \
                           python-django-tagging \
                           python-simplejson \
                           python-memcache \
                           python-ldap \
                           python-cairo \
                           python-twisted \
                           python-pysqlite2 \
                           python-support \
                           python-pip
    
    
    # graphite, carbon, and whisper
    WORKDIR /usr/local/src
    RUN git clone https://github.com/graphite-project/graphite-web.git
    RUN git clone https://github.com/graphite-project/carbon.git
    RUN git clone https://github.com/graphite-project/whisper.git
    RUN cd whisper && git checkout master && python setup.py install
    RUN cd carbon && git checkout 0.9.x && python setup.py install
    RUN cd graphite-web && git checkout 0.9.x && python check-dependencies.py; python setup.py install
    
    # make use of cache from dockerana/carbon
    RUN apt-get -y install gunicorn
    
    RUN mkdir -p /opt/graphite/webapp
    WORKDIR /opt/graphite/webapp
    
    ENV GRAPHITE_STORAGE_DIR /opt/graphite/storage
    ENV GRAPHITE_CONF_DIR /opt/graphite/conf
    ENV PYTHONPATH /opt/graphite/webapp
    
    EXPOSE 8000
    
    CMD ["/usr/bin/gunicorn_django", "-b0.0.0.0:8000", "-w2", "graphite/settings.py"]
    
    

    Snippet from Graphite Dockerfile

    If we then look at the Nginx configuration snippet below, we can see how it is using that link to proxy through the Graphite content to Grafana:

    
    . . .
    
    http {
      . . .
    
      server {
        listen 80 default_server;
        server_name _;
    
        open_log_file_cache max=1000 inactive=20s min_uses=2 valid=1m;
    
        location / {
            proxy_pass                 http://dockerana-graphite-link:8000;
            proxy_set_header           X-Real-IP   $remote_addr;
            proxy_set_header           X-Forwarded-For  $proxy_add_x_forwarded_for;
    
    . . .
    
    

    Snippet from Nginx configuration file

    With all of those components working nicely, we just needed a process to collect and aggregate logs. This sounds like a good candidate for a container - so that’s exactly what we did. We Dockerized that process as well into a simple container that runs a couple scripts which poll various parts of the Docker host to glean logs not only about the host but also about the containers running on the host.

    Here is our simple Dockerfile to build the container to do the log collection, where the primary driver is runner.sh:

    
    FROM ubuntu:trusty
    MAINTAINER George Lewis 
    
    RUN apt-get update
    RUN apt-get install -y sysstat make
    
    RUN perl -MCPAN -e 'install Net::Statsd'
    
    ADD scripts/ingest.pl /usr/local/bin/
    ADD scripts/loop.pl /usr/local/bin/
    ADD scripts/periodic-ingest.sh /usr/local/bin/
    ADD scripts/runner.sh /usr/local/bin/
    
    CMD /usr/local/bin/runner.sh
    

    Snippet from Main Dockerfile

    Now for the fun part: displaying the data. We built a dashboard that is mostly centered around the events on the Docker host. In the future we hope to add automatic dashboards specific to containers, but there’s only so much two people can do in 24 hours.

    Here are some screenshots of what the dashboard looks like, all dynamically configurable:

    Note that here we can see the virtual network interfaces of each container as well as the host:

    Finally, we wanted it to be easy to setup and repeatable by anyone running a Docker host. Below is a screencast showing just how simple it is to get Dockerana up and running:

    DockerCon was a great event. We learned a lot, and got to see a lot of other interesting ideas around Docker that other groups worked on. You can find our presentation as well as those from the other groups that participated on the Docker Blog. We are definitely looking forward to the next hackathon.