Often times the hardest thing about building open source software is conveying what a project really does. The purpose of a project can easily be lost to any given audience due to a variety of factors, including mismatch of technical depth, misinterpreted jargon, or insufficient explanations. Similar to most places, “seeing is believing” in the software world - actually using something is often the only way to solidify points made around the project. However, crafting a demo has remained difficult and time consuming.
For too long, software developers have had few options to provide on-demand product demos, leaving many at the mercy of PowerPoint slides and vague discussions. By combining a few easy habits with open source technologies like Docker, individuals and enterprises alike can automate the creation of simple, intuitive, and reproducible software demos.
This week, our team launched try.lab41.org which provides instances of our open source projects so users can kick the tires before committing to spinning up their own version. We encourage you to checkout Try41 and let us know what you think. In this post, I’m going to walk through five steps we use at Lab41 to easily create repeatable, on-demand demonstrations of our open source projects.
Step 1: Document that ####
The first step toward creating useful demonstrations is to craft multiple layers of documentation at the project level. There are many forms of documentation, from commenting on a single line of code to complete and verbose install instructions and everything in between. Two types we consistently use are markup-generated overviews of code, as well as README-style instructions for first-time installations.
One such example can be taken from Redwood, a framework we’ve been working on at Lab41 to identify anomalous files. Below is a breakdown of languages used in Redwood and how the number of lines of comments compares to the number of lines of code. As you can see, the bulk of the project was written in Python, and there is nearly a 1-to-1 ratio of comments to lines of code. Not bad.
Charts generated by Ohloh
While comments are great, it can often be tedious to hunt through the code and discern what a particular function does and how it is intended to behave. Most modern languages allow for a markup that can be used to generate beautiful, intuitive code documentation.
Looking at Hemlock, another project we’ve spent time on at the Lab, we can see how the markup works in practice.
""" This module is the main core of Hemlock and interfaces with and controls the majority of other modules in this package. ···· Created on 19 August 2013 @author: Charlie Lewis """ ···· from clients.hemlock_base import Hemlock_Base from clients.hemlock_debugger import Hemlock_Debugger from clients.hemlock_runner import Hemlock_Runner ···· import hemlock_options_parser ···· import getpass import json import MySQLdb as mdb import os import requests import sys import texttable as tt import time import uuid ···· class Hemlock(): """ This class is responsible for driving the API and the core functionality of Hemlock. """ ···· def __init__(self): self.log = Hemlock_Debugger() self.HELP_COUNTER = 0 ···· def client_add_schedule(self, args, var_d): """ Adds a specific schedule to a specific client. ···· :param args: arguments to pass in from API :param var_d: dictionary of key/values made from the arguments :return: returns a list of the arguments supplied """ arg_d = [ '--uuid', '--schedule_id' ] return self.check_args(args, arg_d, var_d) ············
Snippet from Hemlock
The red comments enclosed in triple quotes are the markup lines for Python that can be used by tools like Sphinx to generate HTML documentation, as seen in the screenshot below.
Taken from Hemlock’s Documentation
That sort of documentation is great for fellow developers of the project, but what about the rest of us that just want to know how to install the project and get it up and running so that we can actually use the awesome tool? For those less familiar users, we at Lab41 ensure our projects always have a solid README to guide end-to-end installation from an outsider’s perspective.
Having a well-thought-out README goes a long way and should not only explain the project’s intentions, but also include details like installation, dependencies, quick start, known issues, examples, and so on. Here we have the first page of the README for another project we’ve spent a fair amount of time on, called Dendrite, which provides a way to analyze and share graphs.
Taken from Dendrite’s README
There are many ways to document a project, and the more up-to-date and consistent the documentation is, the easier it will be to maintain in the future. More importantly, great documentation will help others get a sense of where the project stands and what it is expected to do.
We’ve often heard the saying, “It’s not a bug; it’s an undocumented feature!”. The truth, however, is that if it’s not documented, it’s a bug. It may be hard at times to fit things like this into a schedule, but this can sometimes be just as valuable if not more so (user experience, etc.) than the product itself.
Step 2: Covering-all Tests with Coveralls
Teams often refactor code - restructure the program - to make it cleaner, less complex, and more intuitive as the project evolves. However refactoring a project can potentially create unpredictable and unstable behavior.
To avoid unintended consequences during the process of refactoring, good test coverage of the code base can help give you peace of mind as you rework functions, syntax, formatting, or other general cleanup.
Testing is another one of those things, like documentation, that often gets left behind, forgotten, or deemed unimportant. To avoid this common target of neglect, we at Lab41 turn to a popular (and automated) testing framework. Beyond the obvious benefits of having tests that ensure a particular project’s code behaves as intended, we’ve found that tests are a great way to craft reproducible demonstrations that behave exactly as intended.
Below we can see the code coverage for several of our projects using Coveralls, which we have integrated with Travis CI (we will cover this tool in more depth in step 3) so that every time a build happens, we can not only ensure that the project builds, but also that the tests pass, automatically.
Taken from Coveralls
Here we see that specifically for Hemlock-REST, a RESTful server for the Hemlock project, the test coverage adjusts for most of the commits in the history, indicating that tests are being written alongside the code for the project.
As you can see, automated testing makes it, er … automatic to march forward with greater peace of mind and less effort. Another specific reason to write tests is to benefit others who want to contribute to a project. Basically, tests are a nice way to show others the way the project is expected to operate - especially for those who haven’t contributed to the project yet or are not familiar with how everything is designed to work.
Unit tests are a great way to get started writing tests that will provide code coverage. Most languages have several different unit testing frameworks, including JUnit (Java), CUnit (C), and my personal favorite, py.test (Python). Combine these testing frameworks with tools like Cobertura, CodeCover, or Emma to generate reports on how well the unit tests covered the code in the project. Finally, feed those reports to Coveralls, and you’re left with automated code coverage tied to commit history as the project emerges.
In concert with documentation, retaining and maintaining traceable testing for a project preps it nicely for the next step toward delivering demonstration: building.
Step 3: Travis the Builder
Project builds are important. Being able to build a project consistently, and furthermore, guarantee that it still builds in the expected manner as the project gets updated and evolves, is paramount to ensuring that the community has a positive experience getting the project up and running on their own.
One of the ways we ensure the project builds correctly with every change we make is by using a tool called Travis CI (“CI” refers to Continuous Integration). There are lots of CI solutions out there, but this one integrates nicely with GitHub and supports a large number of languages and services to build and test against.
Here we have a sample config file for Travis CI that tells Travis what it needs to build and test in order to verify that the new changes made don’t break any tests or intended build executions. We can set multiple targets; this one builds against both OpenJDK7 and OracleJDK7. We can specify which branches get built (or which ones don’t) as well as have before and after installation steps for things like dependencies and test reports.
language: java jdk: - oraclejdk7 - openjdk7 before_install: - source ./scripts/ci/dependencies.sh install: mvn install after_success: - mvn cobertura:cobertura coveralls:cobertura branches: except: - gh-pages notifications: email: - email@example.com
.travis.yml config file for Travis CI for Dendrite
That simple config file translates into a nice user interface that shows the progress, logs, and history of all builds for each specific project setup with Travis.
Travis CI status of Dendrite
Travis CI build history of Dendrite
Each PR (Pull Request) is built and tested against Travis before it gets merged, ensuring that no broken builds end up in Master, or whatever specific branch you’re intending the community use to download and try your project. If the build breaks on the PR, it gives the contributor the opporunity to remedy the error before it gets pushed upstream, which keeps things clean and consistent for everyone.
Step 4: (more) Trusted Deployments with Tags and DockerA lot of groups jump straight to this step, with the popularized war cry: “Ship it!”
However, jumping the gun before doing due diligence on steps 1, 2, and 3 can lead to unstable builds, irreproducible build errors, and next to impossible troubleshooting. For those of us working with open source projects, this can lead to general frustration for all. And there’s no quicker path to unused open source then when something doesn’t work due to lack of documentation, absense of testing, or unchecked build processes.
When you are ready to deploy, there are several great options that vary from generic to specific. Since our projects are all hosted on GitHub we get one deployment path for free: tags.
GitHub tags for Hemlock
GitHub allows us to create tags associated at any particular point in the commit history to create a downloadable version of the project at that particular point in time. This can be great for pre-releases, or even more official releases.
Release notes for pre-release 0.1.6 of Hemlock
Tags are very generic, letting one create downloadable source of anything at any given time, leaving the details of how to get it installed and running up to you.
Another more specific approach that can be used for deployment is PyPI, a Python specific index for packages that can be automatically downloaded and installed via tools like pip and easy_setup. There are many language specific indices for packages, such as CPAN (Perl), RubyGems (Ruby), and Sonatype (Java).
Hemlock package hosted on PyPI
Sometimes project deployment requires many moving parts, multiple languages, and is more complex than just a single package. Docker, which we’ll go into in more detail in step 5, is a fantastic new technology for these complex cases. It provides developers with a simple way to create an environment, based on a simple configuration file, for running one or more processes inside a container. In addition to providing fine-grained resource utilization, this capability moves us faster towards the “build once, ship everywhere” Holy Grail for deploying across multiple machines. Using Docker, we are able to deploy trusted builds of each project that remain synced with GitHub as each project matures and evolves. All the end user has to do is issue a few easy commands to pull down the image and run it; the installation and setup is already baked into the container and ready to go.
Lab41 projects deployed on the Docker Index
Step 5: Rinse and Repeat the Demo Pipeline
Repeatability is the key to tying together demonstration and deployment. Pretty much every developer has run into the opposite (and unfortunate) situation: for example, having a demo that only runs on a specific laptop becuase of undocumented dependencies, untested hacks, an outdated operating system, or unspecific and mismatched build parameters. These all-too-common unreproducible factors really mean you have a weak prototype, not a demo. A demo should be something that can be shared and reproduced, not a Rube Goldberg machine:
Thanks to Docker, we can specify the exact environment(s), all of the required dependencies and their versions, and any other setup required for a given project. Through a simple configuration file, we can be assured that the next time someone builds that Docker container, it will do the exact same thing it did before, regardless of the state of the machine - and without any “gotchas” or undocumented hacks! Once that Docker specification - a Dockerfile - is created and deployed as a trusted build, repeatable demonstration of the project (Redwood in this case) comes as simple as a single command:
docker run –d -P lab41/redwood
Below is an example of a Dockerfile for another project Lab41 has been working on called try41.
from ubuntu MAINTAINER Charlie Lewis
···· ENV REFRESHED_AT 2014-02-14 RUN sed 's/main$/main universe/' -i /etc/apt/sources.list RUN apt-get update ···· # Keep upstart from complaining RUN dpkg-divert --local --rename --add /sbin/initctl RUN ln -s /bin/true /sbin/initctl ···· RUN apt-get install -y git RUN apt-get install -y python-setuptools RUN easy_install pip ADD . /try41 RUN pip install -r /try41/requirements.txt ADD patch/auth.py /usr/local/lib/python2.7/dist-packages/docker/auth/auth.py ADD patch/client.py /usr/local/lib/python2.7/dist-packages/docker/client.py ···· EXPOSE 5000 ···· WORKDIR /try41 CMD ["python", "api.py"] ····
Dockerfile for try41
You’re now ready to follow these five steps for demonstration and delivery:
- Document: Use tools like Sphinx and leverage services like GitHub Pages and Ohloh to give the project more clarity and build a valuable foundation towards easier delivery and documented demonstration.
- Test: Get code coverage via Coveralls and discover the benefits of decisive tests, which by extension, can be used to test both demonstration and delivery.
- Build: Pave the road for deployment by using GitHub’s Pull Request system in concert with the Continuous Integration system, Travis CI.
- Deploy: Employ GitHub tags, services like PyPI, and the Docker Index to prep your projects for delivery and demonstration.
- Repeat: Use Dockerfiles (and allow the projects to be built by Docker) to establish repeatable, consistent, and reliable ways for projects to be demonstrated and delivered without special cases or nasty hacks.
So no more excuses. Get out there and deliver demos for your projects.