A step-by-step guide to implement metrics for your ad-hoc Operator in Golang by Prometheus on Kubernetes

This post describes how to put in place a metrics gathering through Prometheus for a Kubernetes/OpenShift “operator”. This assumes either you are the coder of the operator (written in Go language) or you have access to the source code to modify it :D

The team members of the whole project are;

TLDR;

As part of the IBM Validated Build Labs solution team, we help our business partners to build multi-cloud deployable, auto-pilotable solutions on Kubernetes based environments (AKS, EKS, GKE, IKS… and for sure a strong focus on OpenShift). For that we use obviously operators!

When building operators, at some point there is a need to gather metrics regarding the behavior of the operator by itself. The most common way to gather metrics is to implement Prometheus and then either use it’s the built-in dashboard, or implement the Grafana dashboard which brings even more user friendly interface.

The “kubebuilder” site gives a how-to of this implementation here: https://book.kubebuilder.io/reference/metrics.html?highlight=prometheus#exporting-metrics-for-prometheus

import (
"github.com/prometheus/client_golang/prometheus"
"sigs.k8s.io/controller-runtime/pkg/metrics"
)

var (
goobers = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "goobers_total",
Help: "Number of goobers proccessed",
},
)
gooberFailures = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "goober_failures_total",
Help: "Number of failed goobers",
},
)
)

func initmetrics() {
// Register custom metrics with the global prometheus registry
metrics.Registry.MustRegister(goobers, gooberFailures)
}

Those metrics will be available for prometheus or other openmetrics systems to scrape.

Courtesy to “kubebuilder” site

Well it seems obvious, but in practice is not that simple and you need to follow lots of steps!

Let us walk through the steps you need to do so you can have your metrics ;p)

Step 1-Prepare a cluster

You need to deploy your operator on a Kubernetes cluster. We work heavily on IBM Public cloud, but any Kubernetes platform (even Minikube) would be suitable for your tests. For our development and test environments, we use either IBM Kubernetes services cluster (A.K.A. IKS) or OpenShift clusters (managed k8s offerings hosted by IBM Cloud).

Step 2-Write and Deploy your Operator on your Cluster

If you want to follow the steps, either you have an operator, or you can use the one our team is working on which is available on: https://github.com/IBM/operator-sample-go

To implement the operator follow the instructions provided in the “Setup” section;

https://github.com/IBM/operator-sample-go

Step 3-Deploy Prometheus on your cluster

There are several ways to deploy a Prometheus instance on cluster. The easiest way is to use either Helm charts, or through the Prometheus operator which is available on “operatorhub.io”. We use the latter method as we believe that this is the best way to provide a whole life-cycle management to any application on Kubernetes based clusters.

The installation through the operator hub is quite straightforward;

  1. Go to https://operatorhub.io/
  2. Search for Prometheus
  3. Click on “Prometheus Operator”
  4. Then click on “Install” button and follow the instructions.
  5. In brief what you need to do is;
curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.20.0/install.sh | bash -s v0.20.0kubectl create -f https://operatorhub.io/install/prometheus.yaml
Operatorhub.io page
Installation instructions

Step 4-Make the Prometheus UI accessible on the cluster

You need also enable the Prometheus UI to be accessible either through a “LoadBalancer” or “NodePort” on your Kubernetes cluster.

In my case on my IKS and I created the following configuration;

1-First create a “namespace” (or project under “OpenShift”) with any name you want to use;

kubectl create namespace monitoring

2-Create a “ServiceAccount” YAML file and insert following code

apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus

kubectl apply -f myserviceaccount.yaml -n monitoring

3-Create a “ClusterRole” YAML file as shown below;

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]

kubectl apply -f myclusterrole.yaml -n monitoring

3-Create a “ClusterRoleBinding” YAML file as shown below;

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring

kubectl apply -f myclusterrolebinding.yaml -n monitoring

4-Build an instance of your Prometheus service;

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
serviceMonitorSelector: {}
serviceMonitorNamespaceSelector: {}
resources:
requests:
memory: 400Mi
enableAdminAPI: true

kubectl apply -f myprometheusinstance.yaml -n monitoring

Note: Those {} in the YAML file are very important! The reason is that with enabling this “wildcard” feature, your Prometheus would monitor “any” service, in any namespace.

5-Expose your Prometheus service instance to the internet (IBM Cloud specific);

apiVersion: v1
kind: Service
metadata:
name: prometheus
annotations:
service.kubernetes.io/ibm-load-balancer-cloud-provider-ip-type: "public"
spec:
type: LoadBalancer
ports:
- name: web
port: 9090
protocol: TCP
targetPort: 9090
selector:
prometheus: prometheus

Above is a “LoadBalancer” exposure, and below a “NodePort” version;

apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
- name: web
nodePort: 30901
port: 9090
protocol: TCP
targetPort: web
selector:
prometheus: prometheus

In either ways, apply the configuration against your cluster (or you can proceed by ‘kubectl portforward’ command);

kubectl apply -f myprometheusserviceexpose.yaml -n monitoring

Also for the specific usage with an operator, there is a modification to be done in the “config/default/kustomize.yaml” file; search for “prometheus” and un-comment the line;

From the original version# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus

To

# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
- ../prometheus

Execute the following;

make generatemake manifests

The result would be the creation of a folder containing the servicemonitor specification which will be used by the Prometheus operator.

# Prometheus Monitor Service (Metrics)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: controller-manager
name: controller-manager-metrics-monitor
namespace: system
spec:
endpoints:
- path: /metrics
port: https
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
selector:
matchLabels:
control-plane: controller-manager

So far we have installed and enabled a Prometheus instance on the cluster, now we will implement a sample function to gather and scrape our metrics and display them on the Prometheus dashboard UI.

Screen capture courtesy to Thomas Südbröcker’s blog

Step 5-Implement a function to gather the Operator metrics

Now we will modify the operator so that metrics would be pushed to the Prometheus dashboard.

1-Under the “controller” directory of your operator’s code, implement a new function as the following (we use the same sample code which is provided on kubebuilder page, Red Hat provides almost the same: https://docs.openshift.com/container-platform/4.9/operators/operator_sdk/osdk-monitoring-prometheus.html). My function is called “gathermetrics.go”. As you are in the “controller” directory, this function is part of the whole package.

package controllers

import (
"github.com/prometheus/client_golang/prometheus"
"sigs.k8s.io/controller-runtime/pkg/metrics"
)

var (
goobers = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "goobers_total",
Help: "Number of goobers proccessed",
},
)
gooberFailures = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "goober_failures_total",
Help: "Number of failed goobers",
},
)
)

func init() {
// Register custom metrics with the global prometheus registry
metrics.Registry.MustRegister(goobers, gooberFailures)
}

2-In the “controller.go” under the “Reconcile” function just add the following;


...
// Add metrics information
initmetrics()
goobers.Inc()
...

3-Redeploy your operator and open the Prometheus dashboard and search for “goobers_total”!

Finally, the overall architecture of the metrics enablement is the following;

Diagram courtesy to Thomas Südbröcker’s who is a big fan of diagrams :D and is the diagram guru of the team

Conclusion

This is a very rudimentary example of enabling metrics for an operator in Go language. But once this mechanism implemented, monitoring any resource of your operator would be possible.

Hope it helps your operator metrics gathering!

--

--

--

IT guy for a long time, looking for technical challenges everyday!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The sweetest fruit salad recipe for energy saving services

Wonderland

Writing Aesthetic Python Code (PEP 8)

Just an aesthetic picture

How to buy $TREEB

Elon Musk’s kind of motivation

A deer, as awestruck as me by the responsibilities awaiting me

Why APIs Are Fundamental to Your Business

Docker Compose for Node.js and PostreSQL

🔥We got some fire news everybody🔥

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alain Airom (Ayrom)

Alain Airom (Ayrom)

IT guy for a long time, looking for technical challenges everyday!

More from Medium

Improving Kubernetes response for scale out

cargo ship facing a storm

Provisioning stateful apps in Kubernetes

Granting IAM permissions to pods in EKS using OIDC

Kubernetes, rke2, containerd, Elasticsearch, limits.conf , Increasing the # of open files.