Logo { blog }

innovation through collaboration

Git-ting Better Teamwork Around Graph Analytics

Technical underpinnings of Dendrite's collaboration and experimentation features

As my last post described, Dendrite is our open source project to prototype of how teams can use graph storage and analytics within a shared environment. From our perspective, collaborative experimentation is a pressing need for graph analytics. Several companies, such as Aurelius and GraphLab, are developing robust and scalable technologies for storage and analytics. These tools are extremely powerful, but in our work, we’ve seen several teams that have workflows revolving around a single graph. And nobody wants to mess with the “Master Graph” when they have to manually transfer data or log in to multiple systems, each of which could modify a conclusion their colleague is about to convey to important people.

We think graph analysis can add even more bang for the buck if tailored for a team environment, where colleagues could each experiment with techniques that could alter the structure of the graph. That way, workflows wouldn’t need to revolve around a single graph and colleagues could divide responsibilities, simultaneously test different theories, and follow intuition towards unknown outcomes. Basically, everyone would benefit from all the things that innovative teams do well.

But how could we prototype such a capability, especially since we’d need to link together multiple storage and analytics technologies? And how does it actually work under the hood to support multiple users, each of whom could be doing different or conflicting things?

Just as we can build off existing open source projects, it helps to build from a common workflow paradigm when thinking about collaborative graph experimentation. To work together, many analysts want to create different versions, track changes such as modifications or calculations, and selectively accept or reject those changes back into a shared version. Yes, what I’m describing is the same Track Changes feature everyone has come to know and love from Microsoft® Word®. Imagine if several people tried to edit a document without that feature. Basically, everyone trips over each other’s edits, making it painful to review and merge changes. That is exactly what happens with graphs, posing a huge problem for analysts.

To prototype collaboration around graphs, Dendrite borrows features from a technology that we developers use on a daily basis: distributed version control systems (such as Git). These systems enable teams of engineers to independently modify source code, collaboratively review updates, then selectively accept or reject changes. The paradigm fits so well that we even refer to this aspect of the project as Git for Graphs ®(not really). The only problem is that such systems are not designed to handle Big Data-esque structures, so we actually pushed down the path of implementing custom Git-style features in something that could handle the scale and data type.

What scalable data type can inherently store relationships between projects, graphs, and versions? After several design and coding sessions, we developed something that we call a Metagraph. The concept is that Dendrite uses Titan, the scalable database behind its graphs, to store different versions and the associated metadata about each graph (let that sink in: Dendrite uses a graph to store data about graphs). Within the context of each project, users can create, modify, and clone different versions of a graph. They can even carve off a query-defined subset into a new graph for tailored analysis. In practice, these collaboration features support the essential actions of selectively incorporating data and experimenting with different hypotheses:

With this baseline of Metagraph services, Dendrite demonstrates how teams can use different graph versions—optionally configured with a Git-backed change log—for experimentation and a better workflow. Erick Tryzelaar, Lab41’s resident “Git whisperer” who designed and built the core of this component, rightly deserves credit for drawing these proven concepts into the graph space.

Like most prototypes, the collaboration features within Dendrite would benefit from a few performance optimizations, especially to decrease the storage footprint of multiple versions. Nevertheless, it is a good foundation upon which we will continue building capabilities for better collaboration in the space. If that sounds like a worthy pursuit, we welcome talented engineers to join (perhaps even by applying to work at Lab41) or simply drop us a line if you know of interesting work in this area.

Stay tuned in the coming weeks for additional posts that describe (perhaps even demo) additional facets of Dendrite and other Lab41 projects.