Hemlock is a way of providing a common data access layer.
Hemlock is an open-source project exploring ways to create a common data access layer that eliminates the need to understand underlying data topologies but still preserving the requirements of each data source such as access control, performance, and formats.
Option A, install using pip:
sudo pip install hemlock
Option B, build from source:
git clone https://github.com/Lab41/Hemlock.git
cd Hemlock
sudo python setup.py install
Python modules:
Build a server running MySQL to store user accounts, tenants, and registered systems.
Build a Couchbase 2.0 cluster to store metadata and data of registered systems.
Build an ElasticSearch 0.90.2 cluster to store the index of Couchbase.
Add XDCR one-way replication from Couchbase to ElasticSearch using this plugin (Note, grab version 1.1.0).
Once the plugin is installed, be sure and update the couchbase_template.json under plugins/transport-couchbase/ to have the following:
{
"template" : "*",
"order" : 10,
"mappings" : {
"couchbaseCheckpoint" : {
"_source" : {
"includes" : ["doc.*"]
},
"date_detection" : false,
"dynamic_templates": [
{
"store_no_index": {
"match": "*",
"mapping": {
"store" : "no",
"index" : "no",
"include_in_all" : false
}
}
}
]
},
"_default_" : {
"_source" : {
"includes" : ["meta.*"]
},
"date_detection" : false,
"properties" : {
"meta" : {
"type" : "object",
"include_in_all" : false
}
}
}
}
}
Once that is added, start up ElasticSearch with bin/elasticsearch
and then perform the following the first time:
curl -XPUT http://localhost:9200/_template/couchbase -d @plugins/transport-couchbase/couchbase_template.json
hemlock
in MySQL.hemlock
in Couchbase.hemlock
in ElasticSearch.Create Hemlock credentials (see 'Credential files')
HEMLOCK_MYSQL_SERVER=192.168.1.10
HEMLOCK_MYSQL_USERNAME=user
HEMLOCK_MYSQL_DB=hemlock
HEMLOCK_MYSQL_PW=pass
HEMLOCK_COUCHBASE_SERVER=192.168.1.20
HEMLOCK_COUCHBASE_BUCKET=hemlock
HEMLOCK_COUCHBASE_USERNAME=hemlock
HEMLOCK_COUCHBASE_PW=pass
HEMLOCK_ELASTICSEARCH_ENDPOINT=192.168.1.30
(if you'd like these to persist, consider adding export before each line and performing source
on the file)
Create a tenant, role, user, and data source system
hemlock tenant-create --name Project1
hemlock tenant-list
hemlock role-create --name User
hemlock role-list
hemlock user-create --name User1 \
--username Username1 \
--email user1@email.com \
--role_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
hemlock user-list
hemlock register-local-system --name System1 \
--data_type csv \
--description "description" \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
--hostname system1.fqdn \
--endpoint http://hemlock.server/ \
--poc_name user1 \
--poc_email user1@email.com
hemlock system-list
bash
MYSQL_SERVER=192.168.1.30
MYSQL_DB=db1
#MYSQL_TABLE=table1
MYSQL_USERNAME=user
MYSQL_PW=pass
Store a client
hemlock client-store --name mysql_client_1 --type mysql --system_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --credential_file /path/to/mysql_creds
hemlock client-list
bash
hemlock hemlock-server-store --credential_file /path/to/hemlock_creds
Create a schedule server (optional)
hemlock schedule-server-create --name schedule_server_1
hemlock schedule-server-list
Add a schedule for the data source system to run (optional)
hemlock client-schedule --name schedule1 \
--minute "54" \
--hour "12" \
--day_of_month "*" \
--month "*" \
--day_of_week "*" \
--client_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
--schedule_server_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
hemlock schedule-list
bash
hemlock client-run --uuid 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
Search for data that has been loaded into Hemlock
hemlock query-data --user 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --query foo
or
Direct with elasticsearch:
http://elasticsearch.fqdn:9200/hemlock/_search?q=foo
Which returns something the following:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 3.6582048,
"hits": [
{
"_index": "hemlock",
"_type": "couchbaseDocument",
"_id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
"_score": 3.6582048,
"_source": {
"meta": {
"id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
"rev": "1-0010f1ac6045ccf40000000000000000",
"flags": 0,
"expiration": 0
}
}
}
]
}
}
Now we can feed the 'id' into Couchbase to return the full document:
http://couchbase.fqdn:8092/hemlock/865f458b4421ae5fd758e3c81aca9f8d8b4696b6
Which returns something like the following:
{
"hemlock-system": "a50b86c2-59f7-42a3-aa67-3367579189fe",
"hemlock-date": "2013-09-03 16:10:20",
"stream": "DOYLIE"
}
Create a hemlock_creds
file (see hemlock_creds_sample for an example):
HEMLOCK_MYSQL_SERVER=192.168.1.10
HEMLOCK_MYSQL_USERNAME=user
HEMLOCK_MYSQL_DB=hemlock
HEMLOCK_MYSQL_PW=pass
HEMLOCK_COUCHBASE_SERVER=192.168.1.20
HEMLOCK_COUCHBASE_BUCKET=hemlock
HEMLOCK_COUCHBASE_USERNAME=hemlock
HEMLOCK_COUCHBASE_PW=pass
Create credential files for each client you intend to use (examples).
Technology | Parameter | Python Module Dependencies |
---|---|---|
MySQL | mysql | MySQLdb |
MongoDB | mongo | pymongo |
Redis | redis | redis |
Local FileSystem | fs | magic, pdfminer, xmltodict |
RESTful API | rest | |
Streams | stream_odd |
Create a new class under the clients folder for each new data source type. Most
classes will need two methods defined: connect_client
and get_data
.
The following is a template that can be used to work from:
class HMyclient:
def connect_client(self, client_dict):
# return a handle that can be used to get data from the data source
return c_server
def get_data(self, client_dict, c_server, h_server, client_uuid):
# data_list is an array of arrays to contain the data
data_list = [[]]
# desc_list is an array that contains the schema (if exists or known)
desc_list = []
return data_list, desc_list
Create a tenant
hemlock tenant-create --name Project1
Create a role
hemlock role-create --name User
Create a user
hemlock user-create --name User1 \
--username Username1 \
--email user1@email.com \
--role_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
Register a local system
hemlock register-local-system --name System1 \
--data_type csv \
--description "description" \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
--hostname system1.fqdn \
--endpoint http://hemlock.server/ \
--poc_name user1 \
--poc_email user1@email.com
List registered systems
hemlock system-list
List created users
hemlock user-list
Lists created tenants
hemlock tenant-list
The tests for this project use py.test
What to contribute? Awesome! Issue a pull request or see more details here.