Posts Troubleshooting the OpenTelemetry Target Allocator
Post
Cancel

Troubleshooting the OpenTelemetry Target Allocator

In a recent blog post I talked about how OpenTelemetry uses the target allocator to generate Prometheus config to know what targets to scrape. But… what happens when it doesn’t seem to be working? In this blog post I’m going to step through some troubleshooting steps that I have used to figure out what’s going wrong.

List jobs

The first step is to see what jobs your target allocator is looking at. We can do this by curling the /jobs endpoint.

First I will port forward from my local machine to be able to curl the target allocator endpoint:

1
$ kubectl port-forward svc/otelcol-targetallocator 8080:80

Now we can get the jobs:

1
$ curl localhost:8080/jobs | jq

My output is the following:

1
2
3
4
5
6
7
8
{
  "serviceMonitor/default/my-app/0": {
    "_link": "/jobs/serviceMonitor%2Fdefault%2Fmy-app%2F0/targets"
  },
  "otel-collector": {
    "_link": "/jobs/otel-collector/targets"
  }
}

List job targets

Now that we can validate our jobs, let’s see what they are targeting. One of the helpful things is to just append the _link onto the root target allocator URL. So for me to get the targets I would just curl this:

1
$ curl localhost:8080/jobs/serviceMonitor%2Fdefault%2Fmy-app%2F0/targets | jq

Here you’ll want to search for target you expected. In my case, app: my-app from my ServiceMonitor is in this list:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
  "otelcol-collector-0": {
    "_link": "/jobs/serviceMonitor%2Fdefault%2Fmy-app%2F0/targets?collector_id=otelcol-collector-0",
    "targets": [
      ...
      {
        "targets": [
          "10.244.0.6:10250"
        ],
        "labels": {
          "__meta_kubernetes_pod_labelpresent_chart": "true",
          "__meta_kubernetes_pod_container_image": "quay.io/prometheus-operator/prometheus-operator:v0.65.2",
          "__meta_kubernetes_endpointslice_port_protocol": "TCP",
          "__meta_kubernetes_service_label_release": "prometheus",
          ...
          "__meta_kubernetes_pod_labelpresent_app_kubernetes_io_instance": "true"
        }
      },
      ...
    ]
  }
}

This will tell us that the pod we care about is in the list of targets for this job.

Show the scrape config

The thing we really care about is what scrape config the target allocator is delivering to the collectors. We can do this by curling the /scrape_configs endpoint:

1
$ curl localhost:8080/scrape_configs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
{
  ...
  "serviceMonitor/default/my-app/0": {
    "enable_http2": true,
    "follow_redirects": true,
    "honor_timestamps": true,
    "job_name": "serviceMonitor/default/my-app/0",
    "kubernetes_sd_configs": [
      {
        "enable_http2": true,
        "follow_redirects": true,
        "kubeconfig_file": "",
        "namespaces": {
          "names": [
            "default"
          ],
          "own_namespace": false
        },
        "role": "endpointslice"
      }
    ],
    "metrics_path": "/metrics",
    "relabel_configs": [
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "job"
        ],
        "target_label": "__tmp_prometheus_job_name"
      },
      {
        "action": "keep",
        "regex": "(my-app);true",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_service_label_app",
          "__meta_kubernetes_service_labelpresent_app"
        ]
      },
      {
        "action": "keep",
        "regex": "prom",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_endpointslice_port_name"
        ]
      },
      {
        "action": "replace",
        "regex": "Node;(.*)",
        "replacement": "${1}",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_endpointslice_address_target_kind",
          "__meta_kubernetes_endpointslice_address_target_name"
        ],
        "target_label": "node"
      },
      {
        "action": "replace",
        "regex": "Pod;(.*)",
        "replacement": "${1}",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_endpointslice_address_target_kind",
          "__meta_kubernetes_endpointslice_address_target_name"
        ],
        "target_label": "pod"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_namespace"
        ],
        "target_label": "namespace"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_service_name"
        ],
        "target_label": "service"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_name"
        ],
        "target_label": "pod"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_container_name"
        ],
        "target_label": "container"
      },
      {
        "action": "drop",
        "regex": "(Failed|Succeeded)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_pod_phase"
        ]
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "${1}",
        "separator": ";",
        "source_labels": [
          "__meta_kubernetes_service_name"
        ],
        "target_label": "job"
      },
      {
        "action": "replace",
        "regex": "(.*)",
        "replacement": "prom",
        "separator": ";",
        "target_label": "endpoint"
      },
      {
        "action": "hashmod",
        "modulus": 1,
        "regex": "(.*)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__address__"
        ],
        "target_label": "__tmp_hash"
      },
      {
        "action": "keep",
        "regex": "$(SHARD)",
        "replacement": "$1",
        "separator": ";",
        "source_labels": [
          "__tmp_hash"
        ]
      }
    ],
    "scheme": "http",
    "scrape_interval": "30s",
    "scrape_timeout": "10s"
  }
}

We can see the kubernetes_sd_configs, which would be a familiar sight if you’ve been dealing with Prometheus configuration before.

Validate receiver config

One of the things that might be causing metrics to not show up in OpenTelemetry is misconfiguring the target allocator. Here is what this might look like in your Collector:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otelcol
spec:
  ...
  config: |
    receivers:
      prometheus:
        config:
          scrape_configs:
          - job_name: 'otel-collector'
            scrape_interval: 30s
            static_configs:
            - targets: [ '0.0.0.0:8888' ]
        target_allocator:
          endpoint: http://otelcol-targetallocator
          interval: 30s
          collector_id: "${POD_NAME}"
        ...

Here we can see the receiver has the target_allocator configured. Make sure this endpoint is correct. One way to do this is by running a ephemeral debug container on the collector pod:

1
$ kubectl debug -it --image ubuntu otelcol-collector-0

Inside the container I’ll install curl:

1
# apt update && apt install -y curl

Then try to curl your target_allocator.endpoint:

1
# curl http://otelcol-targetallocator/scrape_configs

You should receive back the JSON dump of scrape config. If that doesn’t work, troubleshoot the connectivity and the endpoint.

Custom image

In the event you need to use a custom image, you can specify this in the targetAllocator configuration for the Collector resource:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: otelcol
spec:
  mode: statefulset
  targetAllocator:
    image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:main
    enabled: true
    serviceAccount: otelcol
    prometheusCR:
      enabled: true
      serviceMonitorSelector:
        app: my-app
  config: |
    ...

Some custom images you might configure here is a newer (or older) version of the target allocator, or your custom build. If you need to patch the target allocator, this is the way to have the collector deploy your custom image.

Summary

Hopefully this blog post has shown as few ways that you can troubleshoot the target allocator if you don’t have Prometheus metrics flowing into OpenTelemetry!

This post is licensed under CC BY 4.0 by the author.