A step-by-step guide to implement metrics for your ad-hoc Operator in Golang by Prometheus on Kubernetes
This post describes how to put in place a metrics gathering through Prometheus for a Kubernetes/OpenShift “operator”. This assumes either you are the coder of the operator (written in Go language) or you have access to the source code to modify it :D
The team members of the whole project are;
TLDR;
As part of the IBM Validated Build Labs solution team, we help our business partners to build multi-cloud deployable, auto-pilotable solutions on Kubernetes based environments (AKS, EKS, GKE, IKS… and for sure a strong focus on OpenShift). For that we use obviously operators!
When building operators, at some point there is a need to gather metrics regarding the behavior of the operator by itself. The most common way to gather metrics is to implement Prometheus and then either use it’s the built-in dashboard, or implement the Grafana dashboard which brings even more user friendly interface.
The “kubebuilder” site gives a how-to of this implementation here: https://book.kubebuilder.io/reference/metrics.html?highlight=prometheus#exporting-metrics-for-prometheus
import (
"github.com/prometheus/client_golang/prometheus"
"sigs.k8s.io/controller-runtime/pkg/metrics"
)
var (
goobers = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "goobers_total",
Help: "Number of goobers proccessed",
},
)
gooberFailures = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "goober_failures_total",
Help: "Number of failed goobers",
},
)
)
func initmetrics() {
// Register custom metrics with the global prometheus registry
metrics.Registry.MustRegister(goobers, gooberFailures)
}
Those metrics will be available for prometheus or other openmetrics systems to scrape.
Well it seems obvious, but in practice is not that simple and you need to follow lots of steps!
Let us walk through the steps you need to do so you can have your metrics ;p)
Step 1-Prepare a cluster
You need to deploy your operator on a Kubernetes cluster. We work heavily on IBM Public cloud, but any Kubernetes platform (even Minikube) would be suitable for your tests. For our development and test environments, we use either IBM Kubernetes services cluster (A.K.A. IKS) or OpenShift clusters (managed k8s offerings hosted by IBM Cloud).
Step 2-Write and Deploy your Operator on your Cluster
If you want to follow the steps, either you have an operator, or you can use the one our team is working on which is available on: https://github.com/IBM/operator-sample-go
To implement the operator follow the instructions provided in the “Setup” section;
Step 3-Deploy Prometheus on your cluster
There are several ways to deploy a Prometheus instance on cluster. The easiest way is to use either Helm charts, or through the Prometheus operator which is available on “operatorhub.io”. We use the latter method as we believe that this is the best way to provide a whole life-cycle management to any application on Kubernetes based clusters.
The installation through the operator hub is quite straightforward;
- Go to https://operatorhub.io/
- Search for Prometheus
- Click on “Prometheus Operator”
- Then click on “Install” button and follow the instructions.
- In brief what you need to do is;
curl -sL https://github.com/operator-framework/operator-lifecycle-manager/releases/download/v0.20.0/install.sh | bash -s v0.20.0kubectl create -f https://operatorhub.io/install/prometheus.yaml
Step 4-Make the Prometheus UI accessible on the cluster
You need also enable the Prometheus UI to be accessible either through a “LoadBalancer” or “NodePort” on your Kubernetes cluster.
In my case on my IKS and I created the following configuration;
1-First create a “namespace” (or project under “OpenShift”) with any name you want to use;
kubectl create namespace monitoring
2-Create a “ServiceAccount” YAML file and insert following code
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
kubectl apply -f myserviceaccount.yaml -n monitoring
3-Create a “ClusterRole” YAML file as shown below;
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups: [""]
resources:
- nodes
- nodes/metrics
- services
- endpoints
- pods
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources:
- configmaps
verbs: ["get"]
- apiGroups:
- networking.k8s.io
resources:
- ingresses
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
kubectl apply -f myclusterrole.yaml -n monitoring
3-Create a “ClusterRoleBinding” YAML file as shown below;
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
kubectl apply -f myclusterrolebinding.yaml -n monitoring
4-Build an instance of your Prometheus service;
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
spec:
serviceAccountName: prometheus
serviceMonitorSelector: {}
serviceMonitorNamespaceSelector: {}
resources:
requests:
memory: 400Mi
enableAdminAPI: true
kubectl apply -f myprometheusinstance.yaml -n monitoring
Note: Those {} in the YAML file are very important! The reason is that with enabling this “wildcard” feature, your Prometheus would monitor “any” service, in any namespace.
5-Expose your Prometheus service instance to the internet (IBM Cloud specific);
apiVersion: v1
kind: Service
metadata:
name: prometheus
annotations:
service.kubernetes.io/ibm-load-balancer-cloud-provider-ip-type: "public"
spec:
type: LoadBalancer
ports:
- name: web
port: 9090
protocol: TCP
targetPort: 9090
selector:
prometheus: prometheus
Above is a “LoadBalancer” exposure, and below a “NodePort” version;
apiVersion: v1
kind: Service
metadata:
name: prometheus
spec:
type: NodePort
ports:
- name: web
nodePort: 30901
port: 9090
protocol: TCP
targetPort: web
selector:
prometheus: prometheus
In either ways, apply the configuration against your cluster (or you can proceed by ‘kubectl portforward’ command);
kubectl apply -f myprometheusserviceexpose.yaml -n monitoring
Also for the specific usage with an operator, there is a modification to be done in the “config/default/kustomize.yaml” file; search for “prometheus” and un-comment the line;
From the original version# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
#- ../prometheus
To
# [PROMETHEUS] To enable prometheus monitor, uncomment all sections with 'PROMETHEUS'.
- ../prometheus
Execute the following;
make generatemake manifests
The result would be the creation of a folder containing the servicemonitor
specification which will be used by the Prometheus operator.
# Prometheus Monitor Service (Metrics)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
control-plane: controller-manager
name: controller-manager-metrics-monitor
namespace: system
spec:
endpoints:
- path: /metrics
port: https
scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
insecureSkipVerify: true
selector:
matchLabels:
control-plane: controller-manager
So far we have installed and enabled a Prometheus instance on the cluster, now we will implement a sample function to gather and scrape our metrics and display them on the Prometheus dashboard UI.
Step 5-Implement a function to gather the Operator metrics
Now we will modify the operator so that metrics would be pushed to the Prometheus dashboard.
1-Under the “controller” directory of your operator’s code, implement a new function as the following (we use the same sample code which is provided on kubebuilder page, Red Hat provides almost the same: https://docs.openshift.com/container-platform/4.9/operators/operator_sdk/osdk-monitoring-prometheus.html). My function is called “gathermetrics.go”. As you are in the “controller” directory, this function is part of the whole package.
package controllers
import (
"github.com/prometheus/client_golang/prometheus"
"sigs.k8s.io/controller-runtime/pkg/metrics"
)
var (
goobers = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "goobers_total",
Help: "Number of goobers proccessed",
},
)
gooberFailures = prometheus.NewCounter(
prometheus.CounterOpts{
Name: "goober_failures_total",
Help: "Number of failed goobers",
},
)
)
func init() {
// Register custom metrics with the global prometheus registry
metrics.Registry.MustRegister(goobers, gooberFailures)
}
2-In the “controller.go” under the “Reconcile” function just add the following;
...
// Add metrics information
initmetrics()
goobers.Inc()
...
3-Redeploy your operator and open the Prometheus dashboard and search for “goobers_total”!
Finally, the overall architecture of the metrics enablement is the following;
Conclusion
This is a very rudimentary example of enabling metrics for an operator in Go language. But once this mechanism implemented, monitoring any resource of your operator would be possible.
Hope it helps your operator metrics gathering!