As mentioned before, we like working with graphs because the mathematical construct inherently captures relationships that matter. But to move beyond theorems and proofs—to effectively use graphs in the real world—we need ways to store and analyze them within a team environment. So how does our Dendrite open source project address that challenge? In short, it ties together modified versions of leading open source technologies, adds a base capability for graph collaboration, and uses a web interface to drive it all.
To understand how we got here, it helps to have a notion of how we work. At its core, Lab41 is a venue for collaboration among talented people from the private sector, academia, and government. Together, we develop prototypes for shared challenges in the Big Data space. Tackling points of overlap can be difficult, but in instances like this, it can be a uniquely effective way to advance capabilities.
Since our Lab is mission-driven, we started by examining the problem space of analysts using graph technologies. To put it mildly, some analysts have to answer very difficult questions. But looking at the problem space alone would have been insufficient. We also took time to consider workflow and communication needs, such as how colleagues can team up to tackle the ever-important Six Degrees of Kevin Bacon.
What we learned is that graph-focused analysts, like most teams in pretty much every industry, did not have a problem with technology alone. The market was already providing access to powerful graph databases, many elegant algorithms had been published through academia and open source, and the tailwinds behind Big Data had delivered robust analytic engines. But there was a core need to combine graph storage and analytic technologies into something a team could collaboratively use. Given the overlap among a variety of groups in different industries, we figured these goals were worth pursuing:
Stacking Storage + Analytics + Collaboration
It’s easy to see that combination of goals requires a full-stack approach. So you’re probably wondering, “What open source technologies did you use, already??” Glad you asked:
Graph Storage: The Aurelius team behind the Titan Distributed Graph Database built an impressive suite of capabilities that enables scale-free storage in either Berkeley DB for small datasets or HBase for horizontally scalable needs.
Graph Analytics: GraphLab is a powerful machine learning engine, which my fellow Lab41 contributor Charlie Lewis managed to execute on graphs from Titan by creatively leveraging its sister project Faunus. Being a Hadoop-based analytic engine, Faunus was also a natural fit for extra horsepower, which we rounded off with in-memory calculations using Java’s JUNG framework.
Information Retrieval: Developers of user-facing analytics must figure out a way to combine deep computational power, which takes time, with the interactivity we’ve come to expect from The Internets. We initially limited Elasticsearch to its standard search features, but now use it as the primary store for listing and visualizing both vertices and edges.
User Interface: A RESTful webserver (using custom SpringMVC controllers paired with endpoints served through the Titan-compatible Rexster) followed principles of data-driven modularity. This design enabled us to build both AngularJS and command-line interfaces while also allowing others to swap a different front end if desired.
I could go on into deep technical details, lessons learned, and future directions of the project, but my colleagues and I will save those topics for future posts, including one in the near future on the technical underpinnings of Dendrite’s collaboration features. For now, I’ll close out this overview with a few (hopefully) lasting impressions:
Initial feedback seems to validate our thoughts that graph technologies could gain wider adoption through co-integration and development of better workflow tools. We welcome contributors to join our project, but would also appreciate pointers to any work in this space.
If you really want to nerd out on graph technologies, consider attending GraphLab’s annual user conference in July. Our team is slated for an in-depth talk on Dendrite.
There is still a lot of room for collaboration between the brainpower in academic research, talented commercial and open source engineers, and government partners with some very challenging problems. In our second year, Lab41 aims to continue cultivating our space as a venue for that type of participation. Contact us if you’re interested in learning more or getting involved.