Apache Zookeeper in practice

1. Introduction

1.1. Apache Zookeeper

First of all, let’s talk about what exactly Apache Zookeeper is. Most of us heard about this top-level Apache Zookeeper project, but it’s a project that is hard to understand just from its Wikipedia page.

Most of us first head about Apache Zookeeper when we tried to install Apache Kafka, which is using Zookeeper for coordination among consumers.

First of all, it’s hard to define Apache Zookeeper for newcomers because it can be used for a lot of things in the world of distributed computing.

But let’s use the definition from StackOverflow. ZooKeeper helps you build distributed applications. It provides a rich set of primitives set of tools that you can use for building distributed applications, for example, distributed Barriers, Queues, and Locks. Mostly they are used as a replicated synchronization service. I also highly encourage you to read about ZooKeeper consistency guarantees.

1.2. Apache Curator

While Apache ZooKeeper design is intentionally kept simple and expressive, going beyond basics can be hard.

That’s where Apache Curator comes in. It has a rich set of recipes that will help you bring simplicity into building distributed applications with Apache Zookeeper.

1.3 Kubernetes

As with all examples that we are going to explore on this website, we want to go beyond a little beyond just runnings things locally. We will use appropriate examples for each technology, where we can play with technology in the environment that close to production.

For this example, we will use Kubernetes 1.10, which has StatefulSets, that made usage of Zookeeper in Kubernetes very easy. Our configuration is heavily based on the article Running ZooKeeper, A CP Distributed System.

2. Installation

In this example we are going to use the cluster we installed before. As you can remember, we installed our cluster with Network Policies capabilities installed (kubespray is using Calico by default). We already have helm installed in our cluster.

2.1. Creating pods

We won’t be able to use existing Zookeeper charts because of few design differences, but we are going to base or Zookeeper configuration on Zookeeper Helm Chart.

We will create 3 pod Zookeeper setup (with quorum size of 2, because of (n/2+1) rule), and will explore how it’s forming quorum, how it’s electing Leader.

Let’s create Zookeeper pods

$ helm install --name zookeeper incubator/zookeeper --set zookeeper.config.logLevel=debug,env.ZK_LOG_LEVEL=TRACE,persistence.enabled=false,env.ZOO_LOG4J_PROP="TRACE\,CONSOLE"

Let’s wait a few minutes and check if our pods are ready.

$ kubectl --kubeconfig=admin.conf get pods
NAMESPACE     NAME                                            READY     STATUS      RESTARTS   AGE
project       zookeeper-0                                     1/1       Running     0          3d
project       zookeeper-1                                     1/1       Running     0          3d
project       zookeeper-2                                     1/1       Running     0          3d

Let’s check if our cluster is working correctly

$ kubectl --kubeconfig=admin.conf exec zookeeper-0 -- /bin/sh -c "echo stat | nc 127.0.0.1 2181"
Zookeeper version: 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
Clients:
 /127.0.0.1:59538[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 31
Sent: 30
Connections: 1
Outstanding: 0
Zxid: 0x0
Mode: follower
Node count: 4

As you can see, our cluster is working correctly. Let’s take at the metrics of one of the nodes:

$ kubectl --kubeconfig=admin.conf exec zookeeper-0 zkMetrics.sh

zk_version      3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
zk_avg_latency  0
zk_max_latency  0
zk_min_latency  0
zk_packets_received     52649
zk_packets_sent 52648
zk_num_alive_connections        1
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count  4
zk_watch_count  0
zk_ephemerals_count     0
zk_approximate_data_size        27
zk_open_file_descriptor_count   41
zk_max_file_descriptor_count    1048576

As you can see, everything is working fine! Now let’s see how Zab protocol is working exactly.

3. Practice

3.1. Leader elections

Before we proceed, let’s try to understand what zxid is. Apache Zookeeper is all about atomic broadcasting and total ordering. Transaction id (zxid) is how Zookeeper does it.

ZooKeeper exposes the total ordering using a ZooKeeper transaction id (zxid).

The zxid has two parts: the epoch and a counter. In our implementation the zxid is a 64-bit number. We use the high order 32-bits for the epoch and the low order 32-bits for the counter.

By default, Zookeeper is using FastLeaderElection election algorithm. Before cluster can be used, Zookeeper pods are running leader election algorithm.

If you are interested in internals of Zab protocol, please refer to ZooKeeper’s atomic broadcast protocol: Theory and practice or ZooKeeper Internals.

The algorithm itself is fairly simple, and let’s take a look at how our 3 pod cluster is formed.

Let’s take a look at the timeline of Leader election.

  • Pod 1 (up), Pod 2 (provisioning), Pod 3 (provisioning).

Apache Zookeeper internals

// Pod 1
2018-03-11 19:33:59,756 [myid:1] - INFO  [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@818] - New election. My id =  1, proposed zxid=0x0
2018-03-11 19:33:59,757 [myid:1] - DEBUG [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 1 (recipient), 1
 (myid), 0x0 (n.peerEpoch)
2018-03-11 19:33:59,757 [myid:1] - DEBUG [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 2 (recipient), 1
 (myid), 0x0 (n.peerEpoch)
2018-03-11 19:33:59,757 [myid:1] - DEBUG [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 1 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 3 (recipient), 1
 (myid), 0x0 (n.peerEpoch)
2018-03-11 19:33:59,757 [myid:1] - DEBUG [WorkerSender[myid=1]:QuorumCnxManager@559] - Opening channel to server 2
2018-03-11 19:33:59,757 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@272] - Receive new notification message. My id = 1

Explanation: Pod 1 enters into electing state. There are no other pods up, so it tries to unsuccessfully to connect to other pods. Pod 1, logically, promoting itself to become a leader.

  • Pod 1 (up), Pod 2 (up), Pod 3 (provisioning).

Apache Zookeeper internals

// Pod 2
2018-03-11 19:34:32,310 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:QuorumPeer@796] - Initializing leader election protocol...
2018-03-11 19:34:32,310 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@719] - Updating proposal: 2 (newleader), 0x0 (newzxid), -1 (oldleader), 0xffffffffffffff
ff (oldzxid)
2018-03-11 19:34:32,310 [myid:2] - INFO  [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@818] - New election. My id =  2, proposed zxid=0x0
2018-03-11 19:34:32,311 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 1 (recipient), 2
 (myid), 0x0 (n.peerEpoch)
2018-03-11 19:34:32,311 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 2 (recipient), 2
 (myid), 0x0 (n.peerEpoch)
2018-03-11 19:34:32,311 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 3 (recipient), 2
 (myid), 0x0 (n.peerEpoch)
 
// Pod 1
2018-03-11 19:34:32,321 [myid:1] - DEBUG [WorkerReceiver[myid=1]:FastLeaderElection$Messenger$WorkerReceiver@272] - Receive new notification message. My id = 1
2018-03-11 19:34:32,321 [myid:1] - INFO  [WorkerReceiver[myid=1]:FastLeaderElection@600] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2018-03-11 19:34:32,322 [myid:1] - DEBUG [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@612] - id: 2, proposed id: 1, zxid: 0x0, proposed zxid: 0x0
2018-03-11 19:34:32,322 [myid:1] - DEBUG [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@719] - Updating proposal: 2 (newleader), 0x0 (newzxid), 1 (oldleader), 0x0 (oldzxid)
2018-03-11 19:34:32,322 [myid:1] - DEBUG [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 1 (recipient), 1 (myid), 0x0 (n.peerEpoch)
2018-03-11 19:34:32,322 [myid:1] - DEBUG [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 2 (recipient), 1 (myid), 0x0 (n.peerEpoch)
2018-03-11 19:34:32,322 [myid:1] - DEBUG [QuorumPeer[myid=1]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 3 (recipient), 1 (myid), 0x0 (n.peerEpoch)

What happends here? Pod 2 entering Electing state and proposes itself to be a leader. Then, Pod 1 agrees on proposal of Pod 2 and starts to propose server 2 to become a leader. Why is what happened ? Both servers have the same proposed zxid: 0x0. This is happening because of this code:

protected boolean totalOrderPredicate(long newId, long newZxid, long newEpoch, long curId, long curZxid, long curEpoch) {
    LOG.debug("id: " + newId + ", proposed id: " + curId + ", zxid: 0x" +
            Long.toHexString(newZxid) + ", proposed zxid: 0x" + Long.toHexString(curZxid));
    if(self.getQuorumVerifier().getWeight(newId) == 0){
        return false;
    }

    /*
     * We return true if one of the following three cases hold:
     * 1- New epoch is higher
     * 2- New epoch is the same as current epoch, but new zxid is higher
     * 3- New epoch is the same as current epoch, new zxid is the same
     *  as current zxid, but server id is higher.
     */

    return ((newEpoch > curEpoch) ||
            ((newEpoch == curEpoch) &&
            ((newZxid > curZxid) || ((newZxid == curZxid) && (newId > curId)))));
}

So Pod 2 wins because of this: New epoch is the same as current epoch, new zxid is the same as current zxid, but server id is higher.

That becomes enough for Pod 2 to receive 2 votes, which is enough to form quorum. Pod 2 is now entering Leading state.

# Pod 2
2018-03-11 19:34:32,316 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@888] - Adding vote: from=1, proposed leader=1, proposed zxid=0x0, proposed election epoch=0x1
2018-03-11 19:34:32,316 [myid:2] - INFO  [WorkerReceiver[myid=2]:FastLeaderElection@600] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2018-03-11 19:34:32,316 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@612] - id: 2, proposed id: 2, zxid: 0x0, proposed zxid: 0x0
2018-03-11 19:34:32,316 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@888] - Adding vote: from=1, proposed leader=2, proposed zxid=0x0, proposed election epoch=0x1
2018-03-11 19:34:32,516 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@551] - About to leave FLE instance: leader=2, zxid=0x0, my id=2, my state=LEADING
2018-03-11 19:34:32,517 [myid:2] - DEBUG [QuorumPeer[myid=2]/0.0.0.0:2181:FastLeaderElection@1000] - Number of connection processing threads: 0
2018-03-11 19:34:32,517 [myid:2] - INFO  [QuorumPeer[myid=2]/0.0.0.0:2181:QuorumPeer@947] - LEADING
  • Pod 1 (up), Pod 2 (up), Pod 3 (up).

Apache Zookeeper internals

# Pod 3
2018-03-11 19:35:13,118 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:QuorumPeer@796] - Initializing leader election protocol...
2018-03-11 19:35:13,118 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@719] - Updating proposal: 3 (newleader), 0x0 (newzxid), -1 (oldleader), 0xffffffffffffff
ff (oldzxid)
2018-03-11 19:35:13,118 [myid:3] - INFO  [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@818] - New election. My id =  3, proposed zxid=0x0
2018-03-11 19:35:13,119 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 1 (recipient), 3
 (myid), 0x0 (n.peerEpoch)
2018-03-11 19:35:13,119 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 2 (recipient), 3
 (myid), 0x0 (n.peerEpoch)
2018-03-11 19:35:13,119 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@589] - Sending Notification: 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), 3 (recipient), 3
 (myid), 0x0 (n.peerEpoch)

...

2018-03-11 19:35:13,123 [myid:3] - DEBUG [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@272] - Receive new notification message. My id = 3
2018-03-11 19:35:13,123 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@600] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x3 (n.round), LOOKI
NG (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2018-03-11 19:35:13,123 [myid:3] - DEBUG [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@272] - Receive new notification message. My id = 3
2018-03-11 19:35:13,123 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@600] - Notification: 1 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKI
NG (n.state), 3 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2018-03-11 19:35:13,124 [myid:3] - DEBUG [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@272] - Receive new notification message. My id = 3
2018-03-11 19:35:13,124 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@600] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x3 (n.round), FOLLO
WING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)

...

2018-03-11 19:35:13,124 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@876] - Notification election epoch is smaller than logicalclock. n.electionEpoch = 0x1, logicalclock=0x3
2018-03-11 19:35:13,124 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@600] - Notification: 1 (message format version), 3 (n.leader), 0x0 (n.zxid), 0x3 (n.round), LOOKING (n.state), 3 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2018-03-11 19:35:13,125 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@612] - id: 2, proposed id: 3, zxid: 0x0, proposed zxid: 0x0
2018-03-11 19:35:13,125 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@888] - Adding vote: from=2, proposed leader=2, proposed zxid=0x0, proposed election epoch=0x3
2018-03-11 19:35:13,125 [myid:3] - DEBUG [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@272] - Receive new notification message. My id = 3
2018-03-11 19:35:13,125 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@612] - id: 3, proposed id: 3, zxid: 0x0, proposed zxid: 0x0
2018-03-11 19:35:13,125 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@600] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x3 (n.round), LEADING (n.state), 2 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2018-03-11 19:35:13,125 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@888] - Adding vote: from=3, proposed leader=3, proposed zxid=0x0, proposed election epoch=0x3
2018-03-11 19:35:13,125 [myid:3] - DEBUG [WorkerReceiver[myid=3]:FastLeaderElection$Messenger$WorkerReceiver@272] - Receive new notification message. My id = 3
2018-03-11 19:35:13,125 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@741] - I'm a participant: 3
2018-03-11 19:35:13,125 [myid:3] - INFO  [WorkerReceiver[myid=3]:FastLeaderElection@600] - Notification: 1 (message format version), 2 (n.leader), 0x0 (n.zxid), 0x3 (n.round), FOLLOWING (n.state), 1 (n.sid), 0x0 (n.peerEpoch) LOOKING (my state)
2018-03-11 19:35:13,125 [myid:3] - DEBUG [QuorumPeer[myid=3]/0.0.0.0:2181:FastLeaderElection@551] - About to leave FLE instance: leader=2, zxid=0x0, my id=3, my state=FOLLOWING

Pod 3 will try to propose itself as a leader, before giving up and acknowledging Pod 2 as a leader. Our cluster is now ready!

3.2. Stress testing

Now, you will these this file (isolate.yaml) to be able to isolate Apache Zookeeper instance using Network Policy.

Today we are going to see how our Apache Zookeeper cluster reacting to node failure. Kubernetes Network Policy will help us isolate nodes and see how other nodes are reacting.

First, let’s make sure our cluster is healthy and all 3 pods are running.

$ kubectl --kubeconfig=admin.conf get pods --all-namespaces | grep zookeeper
project       zookeeper-0                                     1/1       Running     0          3d
project       zookeeper-1                                     1/1       Running     0          3d
project       zookeeper-2                                     1/1       Running     0          3d

Now, let’s check health of one of the pods.

$ kubectl --kubeconfig=admin2.conf exec zookeeper-0 zkMetrics.sh
zk_version      3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
zk_avg_latency  0
zk_max_latency  0
zk_min_latency  0
zk_packets_received     53764
zk_packets_sent 53763
zk_num_alive_connections        1
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count  4
zk_watch_count  0
zk_ephemerals_count     0
zk_approximate_data_size        27
zk_open_file_descriptor_count   41
zk_max_file_descriptor_count    1048576

Let’s find the leader in our cluster. zookeeper-0 is the follower, as we can see from the previous command. Maybe zookeeper-1 is the leader ? Let’s check.

$ kubectl --kubeconfig=admin2.conf exec zookeeper-1 -- /bin/sh -c "echo stat | nc 127.0.0.1 2181"
Zookeeper version: 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
Clients:
 /127.0.0.1:58888[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 11
Sent: 10
Connections: 1
Outstanding: 0
Zxid: 0x100000000
Mode: leader
Node count: 4

Yes, zookeeper-1 is the leader. We will isolate this pod and see how things are changed in our cluster.

But before we will do that, let’s check zxid of the zookeeper-2.

$ kubectl --kubeconfig=admin.conf exec zookeeper-2 -- /bin/sh -c "echo stat | nc 127.0.0.1 2181"
Zookeeper version: 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
Clients:
 /127.0.0.1:55180[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 5
Sent: 4
Connections: 1
Outstanding: 0
Zxid: 0x200000000
Mode: follower
Node count: 4

Illustation below shows current cluster health: Apache Zookeeper internals

Let’s block all the ingress and egress traffic on this node. Let’s install network policy to block traffic based on pod labels.

$ kubectl --kubeconfig=admin.conf apply -f https://scalablesystem.design/assets/files/isolate.yaml
networkpolicy.networking.k8s.io "deny-all-traffic-labeled" created

Let’s label the pod we want to isolate:

$ kubectl --kubeconfig=admin.conf label pods zookeeper-1 blocked=true
pod "zookeeper-1" labeled

Now, let’s check pod health again.

$ kubectl --kubeconfig=admin.conf exec zookeeper-1 -- /bin/sh -c "echo stat | nc 127.0.0.1 2181"
Zookeeper version: 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
Clients:
 /127.0.0.1:47482[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 5
Sent: 4
Connections: 1
Outstanding: 0
Zxid: 0x200000000
Mode: leader
Node count: 4

Looks like everything is fine. But why is that ? We isolated our pod, but things are still working fine. Let’s dig a little deeper. Let’s ssh into our node. Let’s find out which server is hosting our pod.

$ kubectl --kubeconfig=admin.conf describe pod zookeeper-1 | grep node
Node:           node1/10.0.1.1

Let’s ssh into node1.

Let’s check which docker container is assigned to pod zookeeper-1.

$ kubectl --kubeconfig=admin.conf describe pod zookeeper-1 --namespace=project | grep -E -o 'docker\:\/\/([a-z0-9])+$'
docker://dea55fc50861f305dfe89026792f077b86b781afa702991efa4a226422d6b873

Now, we can see which docker container is running our zookeeper-1 pod. Let’s ssh into this container as a root user:

$ sudo docker exec -u root -t -i dea55fc50861f305dfe89026792f077b86b781afa702991efa4a226422d6b873 /bin/bash

Let’s try to run apt-get update

$ apt-get update
root@zookeeper-1:/# apt-get update
Err:1 http://security.ubuntu.com/ubuntu xenial-security InRelease
  Temporary failure resolving 'security.ubuntu.com'
Err:2 http://archive.ubuntu.com/ubuntu xenial InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:3 http://archive.ubuntu.com/ubuntu xenial-updates InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Err:4 http://archive.ubuntu.com/ubuntu xenial-backports InRelease
  Temporary failure resolving 'archive.ubuntu.com'
Reading package lists... Done
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-updates/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://archive.ubuntu.com/ubuntu/dists/xenial-backports/InRelease  Temporary failure resolving 'archive.ubuntu.com'
W: Failed to fetch http://security.ubuntu.com/ubuntu/dists/xenial-security/InRelease  Temporary failure resolving 'security.ubuntu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.

So, why is that ? Looks like we cannot connect to remote servers anymore, but cluster is still working fine. Looks like connection to other Zookeeper instances is still open, and Network Policies are not closing connections that are already open.

Let’s unlabel our pod and try to install few networking tools.

$ kubectl --kubeconfig=admin.conf label pods zookeeper-1 blocked-
pod "zookeeper-1" labeled

Now, let’s try to run apt-get update inside our container again.

root@zookeeper-1:/# apt-get update
Hit:1 http://archive.ubuntu.com/ubuntu xenial InRelease
Get:2 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [109 kB]
Get:3 http://security.ubuntu.com/ubuntu xenial-security InRelease [107 kB]
Get:4 http://archive.ubuntu.com/ubuntu xenial-backports InRelease [107 kB]
Fetched 323 kB in 0s (513 kB/s)
Reading package lists... Done
root@zookeeper-0:/#

It works great! Let’s install few tools (tcpkill and netstat)

root@zookeeper-1:/# apt-get install net-tools dsniff

Now, let’s label our pod again, to block all the traffic from it and to it.

$ kubectl --kubeconfig=admin.conf label pods zookeeper-1 blocked=true
pod "zookeeper-1" labeled

Now, let’s check our theory about open connections.

root@zookeeper-1:/# netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
tcp        0      0 localhost:57750         localhost:2181          TIME_WAIT
tcp        0      0 localhost:57822         localhost:2181          TIME_WAIT
tcp        0      0 localhost:58070         localhost:2181          TIME_WAIT
tcp        0      0 localhost:57892         localhost:2181          TIME_WAIT
tcp        0      0 localhost:58108         localhost:2181          TIME_WAIT
tcp        0      0 localhost:57924         localhost:2181          TIME_WAIT
tcp        0      0 localhost:57856         localhost:2181          TIME_WAIT
tcp        0      0 localhost:57784         localhost:2181          TIME_WAIT
tcp        0      0 localhost:57994         localhost:2181          TIME_WAIT
tcp        0      0 localhost:58142         localhost:2181          TIME_WAIT
tcp        0      0 localhost:57962         localhost:2181          TIME_WAIT
tcp        0      0 localhost:58032         localhost:2181          TIME_WAIT
getnameinfo failed
tcp6       0      0 zookeeper-1.zooke:47864 [UNKNOWN]:2888          ESTABLISHED
tcp6       0      0 localhost:50764         localhost:2181          TIME_WAIT
getnameinfo failed
tcp6       0      0 zookeeper-1.zookee:3888 [UNKNOWN]:53748         ESTABLISHED
tcp6       0      0 localhost:50788         localhost:2181          TIME_WAIT
getnameinfo failed
tcp6       0      0 zookeeper-1.zookee:3888 [UNKNOWN]:48566         ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags       Type       State         I-Node   Path
unix  2      [ ]         STREAM     CONNECTED     1116438
unix  2      [ ]         STREAM     CONNECTED     1109687

We were right! Now, let’s kill those connections. Now, let’s kill those connections. Run inside pod:

root@zookeeper-1:/# tcpkill -i eth0 port 2888 or 3888

Let’s check status of our cluster again

root@zookeeper-1:/# zkMetrics.sh
This ZooKeeper instance is not currently serving requests

That is expected. But how other pods are doing ?

$ kubectl --kubeconfig=admin.conf exec zookeeper-0 -- /bin/sh -c "echo stat | nc 127.0.0.1 2181"
Zookeeper version: 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
Clients:
 /127.0.0.1:55800[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 10
Sent: 9
Connections: 1
Outstanding: 0
Zxid: 0x200000000
Mode: follower
Node count: 4
$ kubectl --kubeconfig=admin.conf exec zookeeper-2 -- /bin/sh -c "echo stat | nc 127.0.0.1 2181"
Zookeeper version: 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 03/23/2017 10:13 GMT
Clients:
 /127.0.0.1:55866[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/0
Received: 11
Sent: 10
Connections: 1
Outstanding: 0
Zxid: 0x300000000
Mode: leader                                                                                           Node count: 4

Pods zookeeper-0 and zookeeper-2 formed a cluster, with zookeeper-2 being the leader (because of the higher zxid). 2 pods formed cluster, because 2 out of 3 remaining machines still count as a majority (out of the fully identified quorum of 3). Apache Zookeeper can keep working as long as ((n/2) + 1) number of znodes are working. Apache Zookeeper internals

3.3. Apache Curator

In the next examples, we are going to use Ammonite and Apache Curator. Both will help us interacting with our Zookeeper cluster and test some of it concepts.

Updated: