Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

target allocator missing default value for scrape settings for prometheusCR #1106

Closed
seankhliao opened this issue Sep 16, 2022 · 2 comments · Fixed by #1124
Closed

target allocator missing default value for scrape settings for prometheusCR #1106

seankhliao opened this issue Sep 16, 2022 · 2 comments · Fixed by #1124
Assignees
Labels
area:target-allocator Issues for target-allocator

Comments

@seankhliao
Copy link
Contributor

Repo with reproducer: https://github.com/seankhliao/testrepo0032

Target Allocator: 0.1.0 and main
Operator: 0.60.0
Collector: 0.60.0

When creating jobs/scrape targets from ServiceMonitors, the target allocator doesn't produce values for:

  • __scrape_interval__
  • __scrape_timeout__
  • __scheme__
  • __metrics_path__
  • maybe also __param_*

When the prometheusreceiver reads the HTTP SD configs, it produces errors like the below for __scrape_*,
"target": "//a.b.c.d:port", "error": "Get \"//a.b.c.d:port\": unsupported protocol scheme \"\"" for missing scheme, and 404s (or other invalid input) for missing paths:

2022-09-16T16:59:39.911Z	error	scrape/scrape.go:488	Creating target failed	{"kind": "receiver", "name": "prometheus", "pipeline": "metrics", "scrape_pool": "/jobs/serviceMonitor%2Fdefault%2Fsm-test%2F0/targets", "error": "instance 0 in group http://ta-test-targetallocator/jobs/serviceMonitor%2Fdefault%2Fsm-test%2F0/targets?collector_id=ta-test-collector-0:1: scrape interval cannot be 0", "errorVerbose": "scrape interval cannot be 0\ngithub.com/prometheus/prometheus/scrape.PopulateLabels\n\tgithub.com/prometheus/[email protected]/scrape/target.go:446\ngithub.com/prometheus/prometheus/scrape.TargetsFromGroup\n\tgithub.com/prometheus/[email protected]/scrape/target.go:504\ngithub.com/prometheus/prometheus/scrape.(*scrapePool).Sync\n\tgithub.com/prometheus/[email protected]/scrape/scrape.go:486\ngithub.com/prometheus/prometheus/scrape.(*Manager).reload.func1\n\tgithub.com/prometheus/[email protected]/scrape/manager.go:222\nruntime.goexit\n\truntime/asm_amd64.s:1571\ninstance 0 in group http://ta-test-targetallocator/jobs/serviceMonitor%2Fdefault%2Fsm-test%2F0/targets?collector_id=ta-test-collector-0:1\ngithub.com/prometheus/prometheus/scrape.TargetsFromGroup\n\tgithub.com/prometheus/[email protected]/scrape/target.go:506\ngithub.com/prometheus/prometheus/scrape.(*scrapePool).Sync\n\tgithub.com/prometheus/[email protected]/scrape/scrape.go:486\ngithub.com/prometheus/prometheus/scrape.(*Manager).reload.func1\n\tgithub.com/prometheus/[email protected]/scrape/manager.go:222\nruntime.goexit\n\truntime/asm_amd64.s:1571"}
github.com/prometheus/prometheus/scrape.(*scrapePool).Sync
	github.com/prometheus/[email protected]/scrape/scrape.go:488
github.com/prometheus/prometheus/scrape.(*Manager).reload.func1
	github.com/prometheus/[email protected]/scrape/manager.go:222
2022-09-16T16:59:39.911Z	error	scrape/scrape.go:488	Creating target failed	{"kind": "receiver", "name": "prometheus", "pipeline": "metrics", "scrape_pool": "/jobs/dummy/targets", "error": "instance 0 in group http://ta-test-targetallocator/jobs/dummy/targets?collector_id=ta-test-collector-0:0: scrape interval cannot be 0", "errorVerbose": "scrape interval cannot be 0\ngithub.com/prometheus/prometheus/scrape.PopulateLabels\n\tgithub.com/prometheus/[email protected]/scrape/target.go:446\ngithub.com/prometheus/prometheus/scrape.TargetsFromGroup\n\tgithub.com/prometheus/[email protected]/scrape/target.go:504\ngithub.com/prometheus/prometheus/scrape.(*scrapePool).Sync\n\tgithub.com/prometheus/[email protected]/scrape/scrape.go:486\ngithub.com/prometheus/prometheus/scrape.(*Manager).reload.func1\n\tgithub.com/prometheus/[email protected]/scrape/manager.go:222\nruntime.goexit\n\truntime/asm_amd64.s:1571\ninstance 0 in group http://ta-test-targetallocator/jobs/dummy/targets?collector_id=ta-test-collector-0:0\ngithub.com/prometheus/prometheus/scrape.TargetsFromGroup\n\tgithub.com/prometheus/[email protected]/scrape/target.go:506\ngithub.com/prometheus/prometheus/scrape.(*scrapePool).Sync\n\tgithub.com/prometheus/[email protected]/scrape/scrape.go:486\ngithub.com/prometheus/prometheus/scrape.(*Manager).reload.func1\n\tgithub.com/prometheus/[email protected]/scrape/manager.go:222\nruntime.goexit\n\truntime/asm_amd64.s:1571"}
github.com/prometheus/prometheus/scrape.(*scrapePool).Sync
	github.com/prometheus/[email protected]/scrape/scrape.go:488
github.com/prometheus/prometheus/scrape.(*Manager).reload.func1
	github.com/prometheus/[email protected]/scrape/manager.go:222
2022-09-16T16:59:39.911Z	error	scrape/scrape.go:488	Creating target failed	{"kind": "receiver", "name": "prometheus", "pipeline": "metrics", "scrape_pool": "/jobs/serviceMonitor%2Fdefault%2Fsm-test%2F0/targets", "error": "instance 0 in group http://ta-test-targetallocator/jobs/serviceMonitor%2Fdefault%2Fsm-test%2F0/targets?collector_id=ta-test-collector-0:0: scrape interval cannot be 0", "errorVerbose": "scrape interval cannot be 0\ngithub.com/prometheus/prometheus/scrape.PopulateLabels\n\tgithub.com/prometheus/[email protected]/scrape/target.go:446\ngithub.com/prometheus/prometheus/scrape.TargetsFromGroup\n\tgithub.com/prometheus/[email protected]/scrape/target.go:504\ngithub.com/prometheus/prometheus/scrape.(*scrapePool).Sync\n\tgithub.com/prometheus/[email protected]/scrape/scrape.go:486\ngithub.com/prometheus/prometheus/scrape.(*Manager).reload.func1\n\tgithub.com/prometheus/[email protected]/scrape/manager.go:222\nruntime.goexit\n\truntime/asm_amd64.s:1571\ninstance 0 in group http://ta-test-targetallocator/jobs/serviceMonitor%2Fdefault%2Fsm-test%2F0/targets?collector_id=ta-test-collector-0:0\ngithub.com/prometheus/prometheus/scrape.TargetsFromGroup\n\tgithub.com/prometheus/[email protected]/scrape/target.go:506\ngithub.com/prometheus/prometheus/scrape.(*scrapePool).Sync\n\tgithub.com/prometheus/[email protected]/scrape/scrape.go:486\ngithub.com/prometheus/prometheus/scrape.(*Manager).reload.func1\n\tgithub.com/prometheus/[email protected]/scrape/manager.go:222\nruntime.goexit\n\truntime/asm_amd64.s:1571"}
github.com/prometheus/prometheus/scrape.(*scrapePool).Sync
	github.com/prometheus/[email protected]/scrape/scrape.go:488
github.com/prometheus/prometheus/scrape.(*Manager).reload.func1
	github.com/prometheus/[email protected]/scrape/manager.go:222

Initial report in slack: https://cloud-native.slack.com/archives/C033BJ8BASU/p1663327567933649

@jaronoff97
Copy link
Contributor

Thanks for reporting this! I can look in to the fix here.

@jaronoff97
Copy link
Contributor

Alright, I began looking in to this and was able to reproduce this in my development cluster. The issue appears to be that the target allocator returns linkJSON which solely contains a link for the http_sd job. When this is used, no scrape interval (or other configuration) is set. Another bug I discovered is the scrape name is set to be the LinkJSON rather than the key to the map (the job name.)

I think the fix will require two changes:

  • One in the target allocator that returns the job config as part of the jobs endpoint
    • We will need to include relabel config, scrape interval, scrape timeout, i'm not sure anything else
  • One in the collector to read this config to construct the expected job from that scrape configuration

I can take both fixes. For the target allocator PR, we are going to need to include some of this information (relabel_config) in the job we set. I believe this is a good opportunity for us to change some of the internal models of the TA which currently make it more difficult to bubble up this information via http. Doing so will set us up for success for #1064

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:target-allocator Issues for target-allocator
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants