In a recent blog post I talked about how OpenTelemetry uses the target allocator to generate Prometheus config to know what targets to scrape. But… what happens when it doesn’t seem to be working? In this blog post I’m going to step through some troubleshooting steps that I have used to figure out what’s going wrong.
List jobs
The first step is to see what jobs your target allocator is looking at. We can do this by curl
ing the /jobs
endpoint.
First I will port forward from my local machine to be able to curl the target allocator endpoint:
1
$ kubectl port-forward svc/otelcol-targetallocator 8080:80
Now we can get the jobs:
1
$ curl localhost:8080/jobs | jq
My output is the following:
1
2
3
4
5
6
7
8
{
"serviceMonitor/default/my-app/0": {
"_link": "/jobs/serviceMonitor%2Fdefault%2Fmy-app%2F0/targets"
},
"otel-collector": {
"_link": "/jobs/otel-collector/targets"
}
}
List job targets
Now that we can validate our jobs, let’s see what they are targeting. One of the helpful things is to just append the _link
onto the root target allocator URL. So for me to get the targets I would just curl this:
1
$ curl localhost:8080/jobs/serviceMonitor%2Fdefault%2Fmy-app%2F0/targets | jq
Here you’ll want to search for target you expected. In my case, app: my-app
from my ServiceMonitor
is in this list:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
"otelcol-collector-0": {
"_link": "/jobs/serviceMonitor%2Fdefault%2Fmy-app%2F0/targets?collector_id=otelcol-collector-0",
"targets": [
...
{
"targets": [
"10.244.0.6:10250"
],
"labels": {
"__meta_kubernetes_pod_labelpresent_chart": "true",
"__meta_kubernetes_pod_container_image": "quay.io/prometheus-operator/prometheus-operator:v0.65.2",
"__meta_kubernetes_endpointslice_port_protocol": "TCP",
"__meta_kubernetes_service_label_release": "prometheus",
...
"__meta_kubernetes_pod_labelpresent_app_kubernetes_io_instance": "true"
}
},
...
]
}
}
This will tell us that the pod we care about is in the list of targets for this job.
Show the scrape config
The thing we really care about is what scrape config the target allocator is delivering to the collectors. We can do this by curling the /scrape_configs
endpoint:
1
$ curl localhost:8080/scrape_configs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
{
...
"serviceMonitor/default/my-app/0": {
"enable_http2": true,
"follow_redirects": true,
"honor_timestamps": true,
"job_name": "serviceMonitor/default/my-app/0",
"kubernetes_sd_configs": [
{
"enable_http2": true,
"follow_redirects": true,
"kubeconfig_file": "",
"namespaces": {
"names": [
"default"
],
"own_namespace": false
},
"role": "endpointslice"
}
],
"metrics_path": "/metrics",
"relabel_configs": [
{
"action": "replace",
"regex": "(.*)",
"replacement": "$1",
"separator": ";",
"source_labels": [
"job"
],
"target_label": "__tmp_prometheus_job_name"
},
{
"action": "keep",
"regex": "(my-app);true",
"replacement": "$1",
"separator": ";",
"source_labels": [
"__meta_kubernetes_service_label_app",
"__meta_kubernetes_service_labelpresent_app"
]
},
{
"action": "keep",
"regex": "prom",
"replacement": "$1",
"separator": ";",
"source_labels": [
"__meta_kubernetes_endpointslice_port_name"
]
},
{
"action": "replace",
"regex": "Node;(.*)",
"replacement": "${1}",
"separator": ";",
"source_labels": [
"__meta_kubernetes_endpointslice_address_target_kind",
"__meta_kubernetes_endpointslice_address_target_name"
],
"target_label": "node"
},
{
"action": "replace",
"regex": "Pod;(.*)",
"replacement": "${1}",
"separator": ";",
"source_labels": [
"__meta_kubernetes_endpointslice_address_target_kind",
"__meta_kubernetes_endpointslice_address_target_name"
],
"target_label": "pod"
},
{
"action": "replace",
"regex": "(.*)",
"replacement": "$1",
"separator": ";",
"source_labels": [
"__meta_kubernetes_namespace"
],
"target_label": "namespace"
},
{
"action": "replace",
"regex": "(.*)",
"replacement": "$1",
"separator": ";",
"source_labels": [
"__meta_kubernetes_service_name"
],
"target_label": "service"
},
{
"action": "replace",
"regex": "(.*)",
"replacement": "$1",
"separator": ";",
"source_labels": [
"__meta_kubernetes_pod_name"
],
"target_label": "pod"
},
{
"action": "replace",
"regex": "(.*)",
"replacement": "$1",
"separator": ";",
"source_labels": [
"__meta_kubernetes_pod_container_name"
],
"target_label": "container"
},
{
"action": "drop",
"regex": "(Failed|Succeeded)",
"replacement": "$1",
"separator": ";",
"source_labels": [
"__meta_kubernetes_pod_phase"
]
},
{
"action": "replace",
"regex": "(.*)",
"replacement": "${1}",
"separator": ";",
"source_labels": [
"__meta_kubernetes_service_name"
],
"target_label": "job"
},
{
"action": "replace",
"regex": "(.*)",
"replacement": "prom",
"separator": ";",
"target_label": "endpoint"
},
{
"action": "hashmod",
"modulus": 1,
"regex": "(.*)",
"replacement": "$1",
"separator": ";",
"source_labels": [
"__address__"
],
"target_label": "__tmp_hash"
},
{
"action": "keep",
"regex": "$(SHARD)",
"replacement": "$1",
"separator": ";",
"source_labels": [
"__tmp_hash"
]
}
],
"scheme": "http",
"scrape_interval": "30s",
"scrape_timeout": "10s"
}
}
We can see the kubernetes_sd_configs
, which would be a familiar sight if you’ve been dealing with Prometheus configuration before.
Validate receiver config
One of the things that might be causing metrics to not show up in OpenTelemetry is misconfiguring the target allocator. Here is what this might look like in your Collector
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otelcol
spec:
...
config: |
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
scrape_interval: 30s
static_configs:
- targets: [ '0.0.0.0:8888' ]
target_allocator:
endpoint: http://otelcol-targetallocator
interval: 30s
collector_id: "${POD_NAME}"
...
Here we can see the receiver has the target_allocator
configured. Make sure this endpoint is correct. One way to do this is by running a ephemeral debug container on the collector pod:
1
$ kubectl debug -it --image ubuntu otelcol-collector-0
Inside the container I’ll install curl
:
1
# apt update && apt install -y curl
Then try to curl your target_allocator.endpoint
:
1
# curl http://otelcol-targetallocator/scrape_configs
You should receive back the JSON dump of scrape config. If that doesn’t work, troubleshoot the connectivity and the endpoint.
Custom image
In the event you need to use a custom image, you can specify this in the targetAllocator
configuration for the Collector
resource:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
name: otelcol
spec:
mode: statefulset
targetAllocator:
image: ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:main
enabled: true
serviceAccount: otelcol
prometheusCR:
enabled: true
serviceMonitorSelector:
app: my-app
config: |
...
Some custom images you might configure here is a newer (or older) version of the target allocator, or your custom build. If you need to patch the target allocator, this is the way to have the collector deploy your custom image.
Summary
Hopefully this blog post has shown as few ways that you can troubleshoot the target allocator if you don’t have Prometheus metrics flowing into OpenTelemetry!