Creating custom Kubernetes metrics with kube-state-metrics

Observability is an important and challenging part of any distributed system. Especially in the case of Kubernetes clusters this can be a tricky task due to the fact that the dozens of controllers (plus third-party operators) are loosely coupled, i.e. they all work independently.

The kube-state-metrics component is a vital part of the Prometheus monitoring stack on Kubernetes: out-of-the-box it provides us metrics about Deployments, Services and all the other core Kubernetes resources. This lets us answer questions such as:

  • How many replicas of the deployment are available?
  • Which pods are in an unhealthy state?
  • How many loadbalancer services doe we have and what are their IPs?

For Custom Resource Definitions (CRDs), i.e. Kubernetes addons provided by third-parties, we usually get metrics from the associated operator. For example, cert-manager provides metrics about the status of Certificate resources.

But this is not always the case - and especially not when you have written your own operator. You can either implement the collection of these metrics and exposing them in a Prometheus-compatible format yourself in Go code (which is quite tedious!), or you can leverage a little known feature of kube-state-metrics called Custom Resource State Metrics. I’ll walk you through an example.

Let’s assume (not entirely hypothetically) that we have a custom resource that looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: drupal.webservices.cern.ch/v1alpha1
kind: DrupalSite
metadata:
  name: jacks-test-site
  namespace: drupal
  creationTimestamp: "2024-01-30T14:59:42Z"
spec:
  hostName: drupal-tests.webtest.cern.ch
  serverDetails:
    assignedRouterShard: apps-shard-1
    serverVersion: el9-serverless
  sitePath: /eos/user/jack/drupal-test-site
status:
  conditions:
  - lastTransitionTime: "2024-02-02T13:20:14Z"
    message: The site is available
    reason: Available
    status: "True"
    type: Ready

We want to expose some fields of this custom resource as metrics:

  • metadata.creationTimestamp: shows the date when the site was created
  • spec.serverDetails.serverVersion: tracking this field allows us to monitor how many users are still on an older release
  • status.conditions: shows the current status of the resources (if it’s healthy, if there were errors etc.)

To install kube-state-metrics, we can use the provided Helm chart (for more options refer to “Usage”):

1
2
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

We’ll use the following, minimal configuration for the Helm chart (custom-state-metrics.values.yaml):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# custom-state-metrics.values.yaml

extraArgs:
  # collect only our metrics, not the defaults ones (deployments etc.)
  - --custom-resource-state-only=true
customResourceState:
  enabled: true
  config:
    kind: CustomResourceStateMetrics
    spec:
      resources: <METRICSCONFIG>
rbac:
  # auto-generate the list of requires RBAC rules for the CRDs we want to watch
  extraRules: <RBAC>

# collect metrics from ALL namespaces
namespaces: ""

# deploy a ServiceMonitor so the metrics are collected by Prometheus
prometheus:
  monitor:
    enabled: true

In the snippet above you can see that we need to fill in two placeholders: METRICSCONFIG and RBAC. The METRICSCONFIG will describe which metrics will be generated and based on which custom resource fields.

To expose the fields mentioned above, we would use the following config:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
- groupVersionKind:
    group: drupal.webservices.cern.ch
    version: v1alpha1
    kind: DrupalSite
  labelsFromPath:
    name: ["metadata", "name"]
    namespace: ["metadata", "namespace"]
  metrics:
    - name: drupalsite_info
      help: "Exposes details about the configuration of a DrupalSite: serverVersion"
      each:
        type: Info
        info:
          labelsFromPath:
            serverVersion: ["spec", "serverVersion"]
            # foo: ["spec", "bar", "baz"]
    - name: drupalsite_creationtimestamp
      help: "Exposes the creation date of this site"
      each:
        type: Gauge
        gauge:
          path: ["metadata", "creationTimestamp"]
    - name: drupalsite_status_conditions
      help: "Exposes the status conditions of this DrupalSite"
      each:
        type: Gauge
        gauge:
          path: [status, conditions]
          labelsFromPath:
            type: ["type"]
          valueFrom: ["status"]

In line 1-4 we specify which CRD kube-state-metrics should watch. Make sure you double- and triple-check the group, version and kind of your CRDs. I’ve spent lots of time debugging tiny errors here. Unfortunately the logs of kube-state-metrics are not very helpful either.

Line 5-7 specify common labels that should be applied to all time-series associated to this resource. In this case all metrics will have the name and namespace labels (which is generally very useful).

Line 9-16 declare the first metric: drupalsite_info. This is a Info metric that exposes some details about the spec of our website. kube-state-metrics will use the value of the field spec.serverVersion (line 15) to place it in the label named serverVersion. The numeric value of the timeseries is always 1 for Info metrics. We can add arbitrary labels from any field in the resource here, as indicated by the foo-bar-baz example (line 16).

Line 17-22 declare the drupalsite_creationtimestamp metric, which is a Gauge metric (a value that can arbitrarily go up and down). kube-state-metrics automatically converts the timestamp string (creationTimestamp: "2024-01-30T14:59:42Z") into a float64 value since all Prometheus metrics must be numeric floating point values.

Line 23-31 specify another Gauge metric, but in this case the labels and values are taken from the .status.conditions array. This means there will be one time-series per entry in the array.

Finally, we must take care of setting up appropriate RBAC permissions before we can deploy this. More specifically, we need to allow the kube-state-metrics pod to perform queries to the Kubernetes API server for the CRDs we want it to monitor.

1
2
3
4
5
# RBAC
- apiGroups: [ "drupal.webservices.cern.ch" ]
  resources: [ "drupalsites" ]
  verbs: ["list","watch"]
  # note that the "list" verb also grants "get" permissions

Take care to use the plural of these resource kind and lower case it (DrupalSite -> drupalsites) in the RBAC rules to avoid hard-to-troubleshoot errors (I’m speaking from experience). As a best practice we should only grant the minimal set of permissions to kube-state-metrics, i.e. do not simply grant it cluster-admin permissions. The Helm chart will put these RBAC rules into a ClusterRole and create a ClusterRoleBinding that points to the ServiceAccount used by the Deployment.

Now we can use the snippets above to populate the placeholders in the custom-state-metrics.values.yaml file.

Complete custom-state-metrics.values.yaml
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# custom-state-metrics.values.yaml

extraArgs:
  # collect only our metrics, not the defaults ones (deployments etc.)
  - --custom-resource-state-only=true
customResourceState:
  enabled: true
  config:
    kind: CustomResourceStateMetrics
    spec:
      resources:
         - groupVersionKind:
             group: drupal.webservices.cern.ch
             version: v1alpha1
             kind: DrupalSite
           labelsFromPath:
             name: ["metadata", "name"]
             namespace: ["metadata", "namespace"]
           metrics:
             - name: drupalsite_info
               help: "Exposes details about the configuration of a DrupalSite: serverVersion"
               each:
                 type: Info
                 info:
                   labelsFromPath:
                     serverVersion: ["spec", "serverVersion"]
                     # foo: ["spec", "bar", "baz"]
             - name: drupalsite_creationtimestamp
               help: "Exposes the creation date of this site"
               each:
                 type: Gauge
                 gauge:
                   path: ["metadata", "creationTimestamp"]
             - name: drupalsite_status_conditions
               help: "Exposes the status conditions of this DrupalSite"
               each:
                 type: Gauge
                 gauge:
                   path: [status, conditions]
                   labelsFromPath:
                     type: ["type"]
                   valueFrom: ["status"]
rbac:
  extraRules:
    - apiGroups: [ "drupal.webservices.cern.ch" ]
      resources: [ "drupalsites" ]
      verbs: ["list","watch"]

# collect metrics from ALL namespaces
namespaces: ""

# deploy a ServiceMonitor so the metrics are collected by Prometheus
prometheus:
  monitor:
    enabled: true

Then we can deploy Helm chart:

1
2
helm install custom-resource -f custom-state-metrics.values.yaml \
     prometheus-community/kube-state-metrics --version 5.21.0

After checking that the pod is running (use kubectl events for debugging), we can access the metrics endpoint as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
kubectl get pod # make sure the kube-state-metrics pod is running
kubectl get svc # get the name of the service
kubectl port-foward custom-resource-kube-state-metrics 8080:8080 &

curl http://localhost:8080/metrics
# HELP kube_customresource_drupalsite_info Exposes details about the configuration of a DrupalSite: serverVersion
# TYPE kube_customresource_drupalsite_info info
kube_customresource_drupalsite_info{customresource_group="drupal.webservices.cern.ch",customresource_kind="DrupalSite",customresource_version="v1alpha1",name="jacks-test-site",namespace="drupal",serverVersion="el9-serverless"} 1
# HELP kube_customresource_drupalsite_creationtimestamp Exposes the creation date of this site
# TYPE kube_customresource_drupalsite_creationtimestamp gauge
kube_customresource_drupalsite_creationtimestamp{customresource_group="drupal.webservices.cern.ch",customresource_kind="DrupalSite",customresource_version="v1alpha1",name="jacks-test-site",namespace="drupal"} 1.701338991e+09
# HELP kube_customresource_drupalsite_status_conditions Exposes the status conditions of this DrupalSite
# TYPE kube_customresource_drupalsite_status_conditions gauge
kube_customresource_drupalsite_status_conditions{customresource_group="drupal.webservices.cern.ch",customresource_kind="DrupalSite",customresource_version="v1alpha1",name="jacks-test-site",namespace="drupal", type="Ready"} 1

Et voilĂ , there we have our metrics!

Since we have a Prometheus operator running in our cluster and enabled the ServiceMonitor with the Helm chart, these metrics are automatically collected by Prometheus. From there we can start building visualizations for our custom resources and define alerts (PrometheusRules) to notify us when the state of a custom resource is not as desired.

Happy monitoring!


#  Bonus: Automatically generate RBAC rules

For those that use a templating engine to provide input for the kube-state-metrics Helm chart (e.g. Helmfile or Argo CD Applications), the following snippet can be used to automatically generate the relevant RBAC rules.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# INPUT
enableMetricsForCRDs:
- drupalsites.drupal.webservices.cern.ch

metrics:
- groupVersionKind:
    group: drupal.webservices.cern.ch
    version: v1alpha1
    kind: DrupalSite
    plural: drupalsites # NOTE THIS ADDITIONAL FIELD!
  labelsFromPath:
    name: ["metadata", "name"]
    namespace: ["metadata", "namespace"]
  metrics:
    - name: drupalsite_info
      help: "Exposes details about the configuration of a DrupalSite: serverVersion"
      each:
        type: Info
        info:
          labelsFromPath:
            serverVersion: ["spec", "serverVersion"]
    - name: drupalsite_creationtimestamp
      help: "Exposes the creation date of this site"
      each:
        type: Gauge
        gauge:
          path: ["metadata", "creationTimestamp"]
    - name: drupalsite_status_conditions
      help: "Exposes the status conditions of this DrupalSite"
      each:
        type: Gauge
        gauge:
          path: [status, conditions]
          labelsFromPath:
            type: ["type"]
          valueFrom: ["status"]

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# HELM TEMPLATE

{{/* build the full list of metrics that should be generated depending on which metrics are enabled */}}
{{ $enabledMetrics := list }}
{{ range $_, $item := .Values.metrics }}
{{ if has (printf "%s.%s" $item.groupVersionKind.plural $item.groupVersionKind.group) $.Values.enableMetricsForCRDs }}
{{ $enabledMetrics = concat $enabledMetrics (list $item) }}
{{ end }}
{{ end }}

customResourceState:
  enabled: true
  config:
    kind: CustomResourceStateMetrics
    spec:
      resources: {{- toYaml $enabledMetrics | nindent 8 }}
rbac:
  extraRules:
  {{- range $_, $value := $enabledMetrics }}
  - apiGroups: [ {{ $value.groupVersionKind.group | quote }} ]
    resources: [ {{ $value.groupVersionKind.plural | lower | quote }} ]
    verbs: ["list","watch"]
  {{- end }}