Monitoring Kubernetes cluster using Prometheus, Grafana Kubernetes App

Intro

No cluster is complete without monitoring. We have to track health of our cluster and make sure things are running properly. One of the best tools for the job is the mix of Prometheus and Grafana.

Prometheus

First of all, we need to install Prometheus itself. It’s going to be used to collect our metrics from Kubernetes cluster.

We are going to use prometheus charts. We are going to use custom values.yaml, because it contains small fix for bug #31.

First, let’s install Prometheus cluster using this helm command.

$ helm install stable/prometheus --name=metrics --set server.persistentVolume.storageClass=local-storage,server.persistentVolume.size=10Gi,alertmanager.persistentVolume.storageClass=local-storage,alertmanager.persistentVolume.size=4Gi,nodeExporter.image.tag=v0.15.2 -f https://gist.githubusercontent.com/artyomboyko/cce155f9c3d5f1dea07a17beed632f5c/raw/cf1d5a20a884907f424629f777b4db57b6deb294/values.yaml

NAME:   metrics
LAST DEPLOYED: Mon Jul 30 17:14:50 2018
NAMESPACE: project
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME                                   TYPE       CLUSTER-IP     EXTERNAL-IP  PORT(S)   AGE
metrics-prometheus-alertmanager        ClusterIP  10.233.32.190  <none>       80/TCP    1s
metrics-prometheus-kube-state-metrics  ClusterIP  None           <none>       80/TCP    1s
metrics-prometheus-node-exporter       ClusterIP  None           <none>       9100/TCP  1s
metrics-prometheus-pushgateway         ClusterIP  10.233.13.93   <none>       9091/TCP  1s
metrics-prometheus-server              ClusterIP  10.233.48.4    <none>       80/TCP    1s

==> v1/Pod(related)
NAME                                                    READY  STATUS             RESTARTS  AGE
metrics-prometheus-node-exporter-l5zs8                  0/1    ContainerCreating  0         3s
metrics-prometheus-node-exporter-q4v67                  0/1    ContainerCreating  0         3s
metrics-prometheus-node-exporter-v9wzr                  0/1    ContainerCreating  0         3s
metrics-prometheus-alertmanager-6749888d56-bsbqj        0/2    Pending            0         3s
metrics-prometheus-kube-state-metrics-57788c8b86-7l8vd  0/1    ContainerCreating  0         3s
metrics-prometheus-pushgateway-94dcfb5bf-zlqz8          0/1    ContainerCreating  0         3s
metrics-prometheus-server-59d9fd4665-h8xwj              0/2    Pending            0         3s

==> v1/ConfigMap
NAME                             DATA  AGE
metrics-prometheus-alertmanager  1     1s
metrics-prometheus-server        3     1s

==> v1/PersistentVolumeClaim
NAME                             STATUS  VOLUME             CAPACITY  ACCESS MODES  STORAGECLASS   AGE
metrics-prometheus-alertmanager  Bound   local-pv-ebe7b747  75Gi      RWO           local-storage  1s
metrics-prometheus-server        Bound   local-pv-a2b998c2  75Gi      RWO           local-storage  1s

==> v1/ServiceAccount
NAME                                   SECRETS  AGE
metrics-prometheus-alertmanager        1        1s
metrics-prometheus-kube-state-metrics  1        1s
metrics-prometheus-node-exporter       1        1s
metrics-prometheus-pushgateway         1        1s
metrics-prometheus-server              1        1s

==> v1beta1/ClusterRoleBinding
NAME                                   AGE
metrics-prometheus-kube-state-metrics  1s
metrics-prometheus-server              1s

==> v1beta1/ClusterRole
NAME                                   AGE
metrics-prometheus-kube-state-metrics  1s
metrics-prometheus-server              1s

==> v1beta1/DaemonSet
NAME                              DESIRED  CURRENT  READY  UP-TO-DATE  AVAILABLE  NODE SELECTOR  AGE
metrics-prometheus-node-exporter  3        3        0      3           0          <none>         1s

==> v1beta1/Deployment
NAME                                   DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
metrics-prometheus-alertmanager        1        1        1           0          1s
metrics-prometheus-kube-state-metrics  1        1        1           0          1s
metrics-prometheus-pushgateway         1        1        1           0          1s
metrics-prometheus-server              1        1        1           0          1s


NOTES:
The Prometheus server can be accessed via port 80 on the following DNS name from within your cluster:
metrics-prometheus-server.project.svc.cluster.local


Get the Prometheus server URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace project -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace project port-forward $POD_NAME 9090


The Prometheus alertmanager can be accessed via port 80 on the following DNS name from within your cluster:
metrics-prometheus-alertmanager.project.svc.cluster.local


Get the Alertmanager URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace project -l "app=prometheus,component=alertmanager" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace project port-forward $POD_NAME 9093


The Prometheus PushGateway can be accessed via port 9091 on the following DNS name from within your cluster:
metrics-prometheus-pushgateway.project.svc.cluster.local


Get the PushGateway URL by running these commands in the same shell:
  export POD_NAME=$(kubectl get pods --namespace project -l "app=prometheus,component=pushgateway" -o jsonpath="{.items[0].metadata.name}")
  kubectl --namespace project port-forward $POD_NAME 9091

For more information on running Prometheus, visit:
https://prometheus.io/

You will get response similar to the one you see above. This will provision our Prometheus cluster, consisting of alertmanager, kube-state-metrics, node-exporter, pushgateway and server.

Grafana

Now, let’s install Kibana add connect it to our Prometheus instance.

Installing Grafana

Before we can install Grafana, we have to be aware of existing bug, which causes some permissions issues. Please take a look at the pull request #5576 for some details. This will fix ‘/var/lib/grafana/plugins’: Permission denied and GF_PATHS_DATA=’/var/lib/grafana’ is not writable issues.

So first, let’s create Kubernetes Job, which will fix permissions for our Grafana pod.

kubectl --kubeconfig=admin.conf create --filename=- <<'EOF'
apiVersion: batch/v1
kind: Job
metadata: {name: grafana-chown}
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
      - name: grafana-chown
        command: [chown, -R, "472:472", /var/lib/grafana]
        image: busybox:latest
        volumeMounts:
        - {name: storage, mountPath: /var/lib/grafana}
      volumes:
      - name: storage
        persistentVolumeClaim:
          claimName: visual-grafana
EOF
$ helm install stable/grafana --set 'env.GF_SERVER_ROOT_URL=/api/v1/namespaces/project/services/visual-grafana:service/proxy/,plugins=grafana-kubernetes-app,persistence.enabled=true,persistence.storageClassName=local-storage,persistence.accessModes={ReadWriteOnce},persistence.size=10Gi'

NAME:   visual
LAST DEPLOYED: Thu Aug  2 11:41:36 2018
NAMESPACE: project
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME            TYPE       CLUSTER-IP     EXTERNAL-IP  PORT(S)  AGE
visual-grafana  ClusterIP  10.233.30.221  <none>       80/TCP   1s

==> v1beta1/Role
NAME            AGE
visual-grafana  1s

==> v1beta1/RoleBinding
NAME            AGE
visual-grafana  1s

==> v1/PersistentVolumeClaim
NAME            STATUS  VOLUME             CAPACITY  ACCESS MODES  STORAGECLASS   AGE
visual-grafana  Bound   local-pv-d44e40d8  75Gi      RWO           local-storage  1s

==> v1/ServiceAccount
NAME            SECRETS  AGE
visual-grafana  1        1s

==> v1/ClusterRole
NAME                        AGE
visual-grafana-clusterrole  1s

==> v1/ClusterRoleBinding
NAME                               AGE
visual-grafana-clusterrolebinding  1s

==> v1beta2/Deployment
NAME            DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
visual-grafana  1        1        1           0          1s

==> v1beta1/PodSecurityPolicy
NAME            DATA   CAPS      SELINUX   RUNASUSER  FSGROUP   SUPGROUP  READONLYROOTFS  VOLUMES
visual-grafana  false  RunAsAny  RunAsAny  RunAsAny   RunAsAny  false     configMap,emptyDir,projected,secret,downwardAPI,persistentVolumeClaim

==> v1/Secret
NAME            TYPE    DATA  AGE
visual-grafana  Opaque  3     1s

==> v1/ConfigMap
NAME            DATA  AGE
visual-grafana  2     1s

==> v1/Pod(related)
NAME                            READY  STATUS   RESTARTS  AGE
visual-grafana-c69db78f4-l2t6l  0/1    Pending  0         1s


NOTES:
1. Get your 'admin' user password by running:

   kubectl get secret --namespace project visual-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

2. The Grafana server can be accessed via port 80 on the following DNS name from within your cluster:

   visual-grafana.project.svc.cluster.local

   Get the Grafana URL to visit by running these commands in the same shell:

     export POD_NAME=$(kubectl get pods --namespace project -l "app=visual-grafana,component=" -o jsonpath="{.items[0].metadata.name}")
     kubectl --namespace project port-forward $POD_NAME 3000

3. Login with the password from step 1 and the username: admin

To be able to visualize our metrics we will use Grafana with Kubernetes App plugin, which comes with pre-defined dashboards, made especially for monitoring Kubernetes.

Enable Grafana Kubernetes App plugin

Now, let’s go to http://localhost:8001/api/v1/namespaces/project/services/visual-grafana:service/proxy/login. You will see login screen like this: Grafana Login Screen

Use this command to get password for user admin:

$ kubectl --kubeconfig=admin.conf get secret --namespace project visual-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
Dx6e2okLldTAmEbONXJGoQf3KWwSvhI9L0lhNU2k

Use these credentials to login.

Now you need to enable Kubernetes App: Grafana Home

Configure Grafana Kubernetes

Next, you need to add Prometheus data source in the section Data Sources using this url http://metrics-prometheus-server.project.svc.cluster.local.

Now, we are ready to add kubernetes cluster. Click on Kubernetes -> Clusters -> New Cluster.

You will need to fill out cluster details, similar to the ones on the screenshot:

Grafana Home

Use public ip of the first node to connect to the api. For the certificates, you can use these files:

/etc/kubernetes/ssl/ca.pem, /etc/kubernetes/ssl/admin-node1.pem, /etc/kubernetes/ssl/admin-node1-key.pem.

After that you will be able to see beautiful metrics like these: Grafana Metrics

Updated: