Posts Ingest Prometheus Metrics with OpenTelemetry
Post
Cancel

Ingest Prometheus Metrics with OpenTelemetry

OpenTelemetry has become the gold standard of cloud-native observability. It is the platform that handles the three observability signals: Traces, metrics, and logs. Metrics are an especially interesting one, though. Metrics, and their consumption, are certainly not new to the world of observability. Prometheus has been around for quite some time now, and its adoption is nothing short of widespread.

OpenTelemetry knows this and embraces Prometheus metrics. While you are free to implement metrics the OpenTelemetry way, that definitely doesn’t mean you should give up on the Prometheus way. In fact, it’s unlikely that you have the bandwidth to rewrite all of your metric implementations in your software (who has time for that?).

Setting aside OpenTelemetry for a moment and focusing on a common Prometheus implementation, a really big helper in the Kubernetes metrics world is the Prometheus operator, which allows you to define Prometheus Custom Resources and dramatically simplify Prometheus scrape config. Here’s a small example of what this could look like with a ServiceMonitor:

Prometheus ServiceMonitor

But let’s say you want to start using OpenTelemetry instead to ingest your metrics? OpenTelemetry doesn’t ask you to throw away this approach, thankfully.

You can continue to expose Prometheus metrics from your applications and exporters! 🎉 The way that OpenTelemetry handles ingestion is through a very important component called the OpenTelemetry Collector. The collector can receive Prometheus metrics by specifying a similar config. But what if you wanted to use the Prometheus operator’s custom resources, like ServiceMonitor and PodMonitor, to configure scraping? After all, this makes Prometheus metrics in a Kubernetes cluster much easier.

OpenTelemetry provides a component with the collector called the Target Allocator, which can take ServiceMonitor and PodMonitor CRs and supplement the collector’s Prometheus ingestion. Here’s how this looks now:

OTel ServiceMonitor

Instead of using Prometheus, we’re now using the OpenTelemetry collector to scrape our metrics. The target allocator knows how to read and handle Prometheus operator CRs, and supplements the Prometheus receiver’s configuration to get the metrics.

TLDR solution

Let’s say you had a Python application that exposes a Prometheus counter metric some_counter:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import random
import time
from prometheus_client import start_http_server, Counter

some_counter = Counter(name="some_counter", documentation="Sample counter")

def main() -> None:
    while True:
        random_sleep_time = random.randint(1, 20)
        print(f"Sleeping for {random_sleep_time} seconds", flush=True)
        time.sleep(random_sleep_time)
        some_counter.inc()

if __name__ == "__main__":
    start_http_server(port=8080)
    main()

Remember, this applications knows nothing about OpenTelemetry. To consume this metric with OpenTelemetry, we’ll define our collector using the very helpful OpenTelemetry operator:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otelcol
spec:
  mode: statefulset
  targetAllocator:
    image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:main
    enabled: true
    serviceAccount: otelcol
    prometheusCR:
      enabled: true
      serviceMonitorSelector:
        app: my-app
  config: |
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: 'otel-collector'
            scrape_interval: 30s
            static_configs:
            - targets: [ '0.0.0.0:8888' ]
        target_allocator:
          endpoint: http://otelcol-targetallocator
          interval: 30s
          collector_id: "${POD_NAME}"

    exporters:
      logging:
        verbosity: detailed

    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: []
          exporters: [logging]

There is a lot going on here, so let’s step through it. We’re creating an instance of the collector that has the target allocator enabled:

1
2
3
4
5
6
7
8
  targetAllocator:
    image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:main
    enabled: true
    serviceAccount: otelcol
    prometheusCR:
      enabled: true
      serviceMonitorSelector:
        app: my-app

This sets the following for the target allocator, which is a separate deployment from the collector itself:

  • The image, which I needed to use off of main due to a bug that isn’t in the latest release yet (likely you can ignore this and just use the default image, but it is helpful to know that you can override this if needed)
  • The serviceAccount that this target allocator will run under, because it will require additional permissions (see below for the detailed RBAC settings for this service account)
  • prometheusCR configuration, which enables and sets a serviceMonitorSelector so that the target allocator knows which ServiceMonitor resources to watch

Then the collector itself has the prometheus receiver configured:

1
2
3
4
5
6
7
8
9
10
11
      prometheus:
        config:
          scrape_configs:
          - job_name: 'otel-collector'
            scrape_interval: 30s
            static_configs:
            - targets: [ '0.0.0.0:8888' ]
        target_allocator:
          endpoint: http://otelcol-targetallocator
          interval: 30s
          collector_id: "${POD_NAME}"

This tells the collector to scrape itself (which exposes metrics as well), and the target allocator should be used to supplement scrape config. Then we set a single exporter:

1
2
3
    exporters:
      logging:
        verbosity: detailed

This is not what you would do in production, but this is an easy way to see the ingested metrics by just dumping them in the collector’s stdout. Finally, we have to tell the collector to use the prometheus receiver and the logging exporter for metrics:

1
2
3
4
5
6
    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: []
          exporters: [logging]

With this collector setup we can see both the collector and target allocator running in our cluster now:

1
2
3
$ kubectl get po | grep otel
otelcol-collector-0                                      1/1     Running   2 (37m ago)   37m
otelcol-targetallocator-8446ff7f55-9tgxv                 1/1     Running   0             37m

And creating the ServiceMonitor to point at our application’s service:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  labels:
    app: my-app
    release: prometheus
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: prom
      path: /metrics

With this in place, our collector should now be scraping our application metrics! Let’s look in the collector’s logs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ kubectl logs otelcol-collector-0
...
Metric #7
Descriptor:
     -> Name: some_counter_total
     -> Description: Sample counter
     -> Unit: 
     -> DataType: Sum
     -> IsMonotonic: true
     -> AggregationTemporality: Cumulative
NumberDataPoints #0
Data point attributes:
     -> container: Str(my-app)
     -> endpoint: Str(prom)
     -> namespace: Str(default)
     -> pod: Str(my-app-6b557d48f6-xm4dk)
     -> service: Str(my-app)
StartTimestamp: 2023-07-02 14:20:47.005 +0000 UTC
Timestamp: 2023-07-02 14:58:47.005 +0000 UTC
Value: 226.000000
...

Detailed solution

Here’s a fully-detailed setup of this using a local kind cluster.

Create the kind cluster

Create the cluster with a local registry.

Install dependencies

Here we need to install the Prometheus operator (so we can get the ServiceMonitor and other CRs):

1
$ helm install prometheus prometheus-community/kube-prometheus-stack

We have to install cert-manager for the OpenTelemetry collector:

1
$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.12.0/cert-manager.yaml

And then finally we will install the OpenTelemetry operator:

1
$ kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

Sample application

Let’s now create our demo application to expose a Prometheus metric. First we’ll create and activate the virtual environment:

1
2
$ python3 -m venv venv
$ . venv/bin/activate

And then install the Prometheus client:

1
$ pip install prometheus-client

And now we need to save these dependencies in requirements.txt:

1
$ pip freeze > requirements.txt

Then create the application in app.py:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import random
import time
from prometheus_client import start_http_server, Counter

some_counter = Counter(name="some_counter", documentation="Sample counter")

def main() -> None:
    while True:
        random_sleep_time = random.randint(1, 20)
        print(f"Sleeping for {random_sleep_time} seconds", flush=True)
        time.sleep(random_sleep_time)
        some_counter.inc()

if __name__ == "__main__":
    start_http_server(port=8080)
    main()

Now we need to create the Dockerfile so we can containizer this:

1
2
3
4
5
6
7
8
9
FROM python:3.10-slim

EXPOSE 8080

WORKDIR /app
COPY ./requirements.txt ./app.py .
RUN pip3 install -r ./requirements.txt

CMD ["python", "app.py"]

Finally let’s build and push this image to the local registry:

1
2
$ docker build -t localhost:5001/otel-prom:latest .
$ docker push localhost:5001/otel-prom:latest

Now we need to deploy the application to the cluster:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: my-app
          image: localhost:5001/otel-prom:latest
          imagePullPolicy: Always
          ports:
            - name: prom
              containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: my-app
  labels:
    app: my-app
spec:
  selector:
    app: my-app
  ports:
    - name: prom
      port: 8080

OTel collector

To start scraping, we first need to deploy our collector and the target allocator. But before this, we need to setup a service account and the necessary RBAC so that it can access the resources it needs to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
apiVersion: v1
kind: ServiceAccount
metadata:
  name: otelcol
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: opentelemetry-targetallocator-cr-role
rules:
- apiGroups:
  - monitoring.coreos.com
  resources:
  - servicemonitors
  - podmonitors
  verbs:
  - '*'
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otelcol-prom
subjects:
  - kind: ServiceAccount
    name: otelcol
    namespace: default
roleRef:
  kind: ClusterRole
  name: opentelemetry-targetallocator-cr-role
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: opentelemetry-targetallocator-role
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs: ["get", "list", "watch"]
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: otelcol-discovery
subjects:
  - kind: ServiceAccount
    name: otelcol
    namespace: default
roleRef:
  kind: ClusterRole
  name: opentelemetry-targetallocator-role
  apiGroup: rbac.authorization.k8s.io

And now we can deploy our collector and target allocator:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otelcol
spec:
  mode: statefulset
  targetAllocator:
    image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:main
    enabled: true
    serviceAccount: otelcol
    prometheusCR:
      enabled: true
      serviceMonitorSelector:
        app: my-app
  config: |
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: 'otel-collector'
            scrape_interval: 30s
            static_configs:
            - targets: [ '0.0.0.0:8888' ]
        target_allocator:
          endpoint: http://otelcol-targetallocator
          interval: 30s
          collector_id: "${POD_NAME}"

    exporters:
      logging:
        verbosity: detailed

    service:
      pipelines:
        metrics:
          receivers: [prometheus]
          processors: []
          exporters: [logging]

ServiceMonitor

Finally we need to create the ServiceMonitor to point our target allocator to it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app
  labels:
    app: my-app
    release: prometheus
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: prom
      path: /metrics

Viewing metrics

With this all setup, you should now be able to view metrics in the logs of your collector! Here’s a good one-liner to see all the metrics that have been collected:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
$ kubectl logs otelcol-collector-0 | grep "Name:" | sort | uniq | xsel -b
     -> Name: otelcol_exporter_enqueue_failed_log_records
     -> Name: otelcol_exporter_enqueue_failed_metric_points
     -> Name: otelcol_exporter_enqueue_failed_spans
     -> Name: otelcol_exporter_sent_metric_points
     -> Name: otelcol_process_cpu_seconds
     -> Name: otelcol_process_memory_rss
     -> Name: otelcol_process_runtime_heap_alloc_bytes
     -> Name: otelcol_process_runtime_total_alloc_bytes
     -> Name: otelcol_process_runtime_total_sys_memory_bytes
     -> Name: otelcol_process_uptime
     -> Name: otelcol_receiver_accepted_metric_points
     -> Name: otelcol_receiver_refused_metric_points
     -> Name: process_cpu_seconds_total
     -> Name: process_max_fds
     -> Name: process_open_fds
     -> Name: process_resident_memory_bytes
     -> Name: process_start_time_seconds
     -> Name: process_virtual_memory_bytes
     -> Name: python_gc_collections_total
     -> Name: python_gc_objects_collected_total
     -> Name: python_gc_objects_uncollectable_total
     -> Name: python_info
     -> Name: scrape_duration_seconds
     -> Name: scrape_samples_post_metric_relabeling
     -> Name: scrape_samples_scraped
     -> Name: scrape_series_added
     -> Name: some_counter_created
     -> Name: some_counter_total
     -> Name: up

As you can see, our some_counter is now being collected by OpenTelemetry!

This post is licensed under CC BY 4.0 by the author.