Hemlock

Hemlock is an open-source project exploring ways to create a common data access layer that eliminates the need to understand underlying data topologies but still preserving the requirements of each data source such as access control, performance, and formats.

Hemlock L

Install instructions

Option A, install using pip:

sudo pip install hemlock

Option B, build from source:

git clone https://github.com/Lab41/Hemlock.git
cd Hemlock
sudo python setup.py install

Required Dependencies

Python modules:

Build a server running MySQL to store user accounts, tenants, and registered systems.

Build a Couchbase 2.0 cluster to store metadata and data of registered systems.

Build an ElasticSearch 0.90.2 cluster to store the index of Couchbase.

Add XDCR one-way replication from Couchbase to ElasticSearch using this plugin (Note, grab version 1.1.0).

Once the plugin is installed, be sure and update the couchbase_template.json under plugins/transport-couchbase/ to have the following:

{
    "template" : "*",
    "order" : 10,
    "mappings" : {
        "couchbaseCheckpoint" : {
            "_source" : {
                "includes" : ["doc.*"]
            },
            "date_detection" : false,
            "dynamic_templates": [
                {
                    "store_no_index": {
                        "match": "*",
                        "mapping": {
                            "store" : "no",
                            "index" : "no",
                            "include_in_all" : false
                        }
                    }
                }
            ]
        },
        "_default_" : {
            "_source" : {
                "includes" : ["meta.*"]
            },
            "date_detection" : false,
            "properties" : {
                "meta" : {
                    "type" : "object",
                    "include_in_all" : false
                }
            }
        }
    }
}

Once that is added, start up ElasticSearch with bin/elasticsearch and then perform the following the first time:

curl -XPUT http://localhost:9200/_template/couchbase -d @plugins/transport-couchbase/couchbase_template.json

Installing required databases

Create database hemlock in MySQL.
Create bucket hemlock in Couchbase.
Create index hemlock in ElasticSearch.

Getting started

Create Hemlock credentials (see 'Credential files')

HEMLOCK_MYSQL_SERVER=192.168.1.10
HEMLOCK_MYSQL_USERNAME=user
HEMLOCK_MYSQL_DB=hemlock
HEMLOCK_MYSQL_PW=pass
HEMLOCK_COUCHBASE_SERVER=192.168.1.20
HEMLOCK_COUCHBASE_BUCKET=hemlock
HEMLOCK_COUCHBASE_USERNAME=hemlock
HEMLOCK_COUCHBASE_PW=pass
HEMLOCK_ELASTICSEARCH_ENDPOINT=192.168.1.30

(if you'd like these to persist, consider adding export before each line and performing source on the file)

Create a tenant, role, user, and data source system

hemlock tenant-create --name Project1

hemlock tenant-list

hemlock role-create --name User

hemlock role-list

hemlock user-create --name User1 \
                    --username Username1 \
                    --email user1@email.com \
                    --role_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
                    --tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6

hemlock user-list

hemlock register-local-system --name System1 \
                              --data_type csv \
                              --description "description" \
                              --tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
                              --hostname system1.fqdn \
                              --endpoint http://hemlock.server/ \
                              --poc_name user1 \
                              --poc_email user1@email.com

hemlock system-list

Add credentials for data source system, for example: mysql_creds bash MYSQL_SERVER=192.168.1.30 MYSQL_DB=db1 #MYSQL_TABLE=table1 MYSQL_USERNAME=user MYSQL_PW=pass

Store a client

hemlock client-store --name mysql_client_1 --type mysql --system_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --credential_file /path/to/mysql_creds 

hemlock client-list

Add credentials for hemlock bash hemlock hemlock-server-store --credential_file /path/to/hemlock_creds

Create a schedule server (optional)

hemlock schedule-server-create --name schedule_server_1

hemlock schedule-server-list

Add a schedule for the data source system to run (optional)

hemlock client-schedule --name schedule1 \
                      --minute "54" \
                      --hour "12" \
                      --day_of_month "*" \
                      --month "*" \
                      --day_of_week "*" \
                      --client_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
                      --schedule_server_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6

hemlock schedule-list

Perform a test run for pulling data from the data source system bash hemlock client-run --uuid 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6

Search for data that has been loaded into Hemlock

hemlock query-data --user 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --query foo

Direct with elasticsearch:

http://elasticsearch.fqdn:9200/hemlock/_search?q=foo

Which returns something the following:

{
"took": 14,
"timed_out": false,
"_shards": {
    "total": 20,
    "successful": 20,
    "failed": 0
},
"hits": {
    "total": 1,
    "max_score": 3.6582048,
    "hits": [
        {
            "_index": "hemlock",
            "_type": "couchbaseDocument",
            "_id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
            "_score": 3.6582048,
            "_source": {
                "meta": {
                    "id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
                    "rev": "1-0010f1ac6045ccf40000000000000000",
                    "flags": 0,
                    "expiration": 0
                }
            }
        }
    ]
}
}

Now we can feed the 'id' into Couchbase to return the full document:

http://couchbase.fqdn:8092/hemlock/865f458b4421ae5fd758e3c81aca9f8d8b4696b6

Which returns something like the following:

{
"hemlock-system": "a50b86c2-59f7-42a3-aa67-3367579189fe",
"hemlock-date": "2013-09-03 16:10:20",
"stream": "DOYLIE"
}

Credential files

Create a hemlock_creds file (see hemlock_creds_sample for an example):

HEMLOCK_MYSQL_SERVER=192.168.1.10
HEMLOCK_MYSQL_USERNAME=user
HEMLOCK_MYSQL_DB=hemlock
HEMLOCK_MYSQL_PW=pass
HEMLOCK_COUCHBASE_SERVER=192.168.1.20
HEMLOCK_COUCHBASE_BUCKET=hemlock
HEMLOCK_COUCHBASE_USERNAME=hemlock
HEMLOCK_COUCHBASE_PW=pass

Create credential files for each client you intend to use (examples).

Currently supported data sources

Technology	Parameter	Python Module Dependencies
MySQL	mysql	MySQLdb
MongoDB	mongo	pymongo
Redis	redis	redis
Local FileSystem	fs	magic, pdfminer, xmltodict
RESTful API	rest
Streams	stream_odd

Adding a new data source type

Create a new class under the clients folder for each new data source type. Most classes will need two methods defined: connect_client and get_data.

The following is a template that can be used to work from:

class HMyclient:
    def connect_client(self, client_dict):
        # return a handle that can be used to get data from the data source
        return c_server
    def get_data(self, client_dict, c_server, h_server, client_uuid):
        # data_list is an array of arrays to contain the data
        data_list = [[]]
        # desc_list is an array that contains the schema (if exists or known)
        desc_list = []
        return data_list, desc_list

Usage examples

Create a tenant
```
hemlock tenant-create --name Project1
```
Create a role
```
hemlock role-create --name User
```

Create a user

hemlock user-create --name User1 \
                    --username Username1 \
                    --email user1@email.com \
                    --role_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
                    --tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6

hemlock register-local-system --name System1 \
                              --data_type csv \
                              --description "description" \
                              --tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
                              --hostname system1.fqdn \
                              --endpoint http://hemlock.server/ \
                              --poc_name user1 \
                              --poc_email user1@email.com

List registered systems
```
hemlock system-list
```
List created users
```
hemlock user-list
```
Lists created tenants
```
hemlock tenant-list
```
Connecting to a client
Full CLI API list

Related repositories

Documentation

Docs

Tests

The tests for this project use py.test

Contributing to Hemlock

What to contribute? Awesome! Issue a pull request or see more details here.