Apache DistributedLog can be easily deployed in Kubernetes clusters. The managed clusters on Google Container Engine is the most convenient way.

The deployment method shown in this guide relies on YAML definitions for Kubernetes resources. The kubernetes subdirectory holds resource definitions for:

  • A three-node ZooKeeper cluster
  • A BookKeeper cluster with a bookie runs on each node.
  • A three-node Proxy cluster.

If you already have setup a BookKeeper cluster following the instructions of Deploying Apache BookKeeper on Kubernetes in Apache BookKeeper website, you can skip deploying bookkeeper and start from Create a DistributedLog Namespace.

Setup on Google Container Engine

To get started, get source code of kubernetes yaml definitions from github by git clone.

If you’d like to change the number of bookies, ZooKeeper nodes, or proxy nodes in your deployment, modify the replicas parameter in the spec section of the appropriate Deployment or StatefulSet resource.

Google Container Engine (GKE) automates the creation and management of Kubernetes clusters in Google Compute Engine (GCE).

Prerequisites

To get started, you’ll need:

Create a new Kubernetes cluster

You can create a new GKE cluster using the container clusters create command for gcloud. This command enables you to specify the number of nodes in the cluster, the machine types of those nodes, and more.

As an example, we’ll create a new GKE cluster for Kubernetes version 1.7.5 in the us-central1-a zone. The cluster will be named bookkeeper-gke-cluster and will consist of three VMs, each using two locally attached SSDs and running on n1-standard-8 machines. These SSDs will be used by Bookie instances, one for the BookKeeper journal and the other for storing the actual data.

$ gcloud config set compute/zone us-central1-a
$ gcloud config set project your-project-name
$ gcloud container clusters create bookkeeper-gke-cluster \
  --machine-type=n1-standard-8 \
  --num-nodes=3 \
  --local-ssd-count=2 \
  --cluster-version=1.7.5

By default, bookies will run on all the machines that have locally attached SSD disks. In this example, all of those machines will have two SSDs, but you can add different types of machines to the cluster later. You can control which machines host bookie servers using labels.

Dashboard

You can observe your cluster in the Kubernetes Dashboard by downloading the credentials for your Kubernetes cluster and opening up a proxy to the cluster:

$ gcloud container clusters get-credentials bookkeeper-gke-cluster \
  --zone=us-central1-a \
  --project=your-project-name
$ kubectl proxy

By default, the proxy will be opened on port 8001. Now you can navigate to localhost:8001/ui in your browser to access the dashboard. At first your GKE cluster will be empty, but that will change as you begin deploying.

When you create a cluster, your kubectl config in ~/.kube/config (on MacOS and Linux) will be updated for you, so you probably won’t need to change your configuration. Nonetheless, you can ensure that kubectl can interact with your cluster by listing the nodes in the cluster:

$ kubectl get nodes

If kubectl is working with your cluster, you can proceed to deploy ZooKeeper and Bookies.

ZooKeeper

You must deploy ZooKeeper as the first component, as it is a dependency for the others.

$ kubectl apply -f zookeeper.yaml

Wait until all three ZooKeeper server pods are up and have the status Running. You can check on the status of the ZooKeeper pods at any time:

$ kubectl get pods -l component=zookeeper
NAME      READY     STATUS             RESTARTS   AGE
zk-0      1/1       Running            0          18m
zk-1      1/1       Running            0          17m
zk-2      0/1       Running            6          15m

This step may take several minutes, as Kubernetes needs to download the Docker image on the VMs.

If you want to connect to one of the remote zookeeper server, you can usezk-shell, you need to forward a local port to the remote zookeeper server:

$ kubectl port-forward zk-0 2181:2181
$ zk-shell localhost 2181

Deploy Bookies

Once ZooKeeper cluster is Running, you can then deploy the bookies.

$ kubectl apply -f bookkeeper.yaml

You can check on the status of the Bookie pods for these components either in the Kubernetes Dashboard or using kubectl:

$ kubectl get pods

While all BookKeeper pods is Running, by zk-shell you could find all available bookies under /ledgers/

You can also verify the deployment by ssh to a bookie pod.

$ kubectl exec -it <pod_name> -- bash

On the bookie pod, you can run simpletest to verify the installation. The simpletest will create a ledger and append a few entries into the ledger.

$ BOOKIE_CONF=/opt/bookkeeper/conf/bk_server.conf /opt/distributedlog/bin/dlog bkshell simpletest

Monitoring

Apache BookKeeper provides stats provider for being able to integrate with different monitoring systems. The default monitoring stack for Apache BookKeeper on Kubernetes has consists of Prometheus and Grafana.

You can deploy one instance of Prometheus and one instance of Grafana by running following command:

$ kubectl apply -f monitoring.yaml

Prometheus

All BookKeeper/DistributedLog metrics in Kubernetes are collected by a Prometheus instance running inside the cluster. Typically, there is no need to access Prometheus directly. Instead, you can use the Grafana interface that displays the data stored in Prometheus.

Grafana

In your Kubernetes cluster, you can use Grafana to view dashbaords for JVM stats, ZooKeeper, and BookKeeper. You can get access to the pod serving Grafana using kubectl’s port-forward command:

$ kubectl port-forward $(kubectl get pods | grep grafana | awk '{print $1}') 3000

You can then access the dashboard in your web browser at localhost:3000.

Create DistributedLog Namespace

At this moment, you have a bookkeeper cluster up running on kubernetes. Now, You can create a distributedlog namespace and start playing with it. If you setup the bookkeeper cluster following the above instructions, it uses apachedistributedlog/distributedlog:0.5.0 image for running bookies. You can skip creating distributedlog namespace here and move to next section. Because it already created a default namespace distributedlog://zookeeper/distributedlog for you when starting the bookies.

You can create a distributedlog namespace using the dlog tool.

$ kubectl run dlog --rm=true --attach --image=apachedistributedlog/distributedlog:0.5.0 --restart=OnFailure -- /opt/distributedlog/bin/dlog admin bind -l /bookkeeper/ledgers -s zookeeper -c distributedlog://zookeeper/distributedlog

After you have a distributedlog namespace, you can play around the namespace by using dlog tool to create, delete, list and show the streams.

Create Streams

Create 10 streams prefixed with mystream-.

$ kubectl run dlog --rm=true --attach --image=apachedistributedlog/distributedlog:0.5.0 --restart=OnFailure -- /opt/distributedlog/bin/dlog tool create -u distributedlog://zookeeper/distributedlog -r mystream- -e 0-9 -f

List Streams

List the streams under the namespace.

$ kubectl run dlog --rm=true --attach --image=apachedistributedlog/distributedlog:0.5.0 --restart=OnFailure -- /opt/distributedlog/bin/dlog tool list -u distributedlog://zookeeper/distributedlog

An example of the output of this command is:

Streams under distributedlog://zookeeper/distributedlog :
--------------------------------
mystream-0
mystream-9
mystream-6
mystream-5
mystream-8
mystream-7
mystream-2
mystream-1
mystream-4
mystream-3
--------------------------------

Write and Read Records

You can run a simple benchmark on testing writing and read from distributedlog streams.

Start one instance of benchmark-writer to write to 100 streams. (The streams are automatically created by the benchmark writer)

$ kubectl apply -f benchmark-writer.yaml

Start one instance of benchmark-reader to read from those 100 streams.

$ kubectl apply -f benchmark-reader.yaml

You can monitor the Grafana dashboard for the traffic comes from benchmark writer and reader.

Un-Deploy

Delete BookKeeper

$ kubectl delete -f bookkeeper.yaml    

Delete ZooKeeper

$ kubectl delete -f zookeeper.yaml    

Delete cluster

$ gcloud container clusters delete bookkeeper-gke-cluster