-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/prometheusremotewriteexporter] Prometheus remote write exporter does not retry on 5xx #20304
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
I would like to propose that we add a retry in the moment a batch of metrics is exported: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/prometheusremotewriteexporter/exporter.go#L250 This would increase the risk of out of order samples, because the component can operate with multiple workers that export the batches in parallel: https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/exporter/prometheusremotewriteexporter/exporter.go#L202 To eliminate the risk of out of order samples, I also propose that we change how we submit data in parallel. Instead of just submitting the batches in parallel, we should try to make sure that a single time series is always submitted to prometheus sequentially. The idea is that you can only submit samples for different time series in parallel. Samples for the same time series should always be written sequentially. To achieve this I propose that:
|
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
There is still work being done for this issue in #23842 |
/label -stale |
…ing `cenkalti/backoff` client (#23842) **Description:** Fix the Retry on 5xx status code using `cenkalti/backoff` package and added unit test. Thanks to @rapphil for proposing the [solution](#20304 (comment)). **Link to tracking Issue:** #20304 **Testing:** Added unit tests for retryOn5xx and noRetryOn4xx Prometheus compliance test results - ``` --- PASS: TestRemoteWrite (20.10s) --- PASS: TestRemoteWrite/otelcollector (0.01s) --- PASS: TestRemoteWrite/otelcollector/NameLabel (10.05s) --- PASS: TestRemoteWrite/otelcollector/Invalid (10.06s) --- PASS: TestRemoteWrite/otelcollector/RepeatedLabels (10.06s) --- PASS: TestRemoteWrite/otelcollector/Retries400 (10.06s) --- PASS: TestRemoteWrite/otelcollector/SortedLabels (10.06s) --- PASS: TestRemoteWrite/otelcollector/Up (10.06s) --- PASS: TestRemoteWrite/otelcollector/JobLabel (10.06s) --- PASS: TestRemoteWrite/otelcollector/HonorLabels (10.06s) --- PASS: TestRemoteWrite/otelcollector/EmptyLabels (10.06s) --- PASS: TestRemoteWrite/otelcollector/Counter (10.06s) --- PASS: TestRemoteWrite/otelcollector/InstanceLabel (10.06s) --- PASS: TestRemoteWrite/otelcollector/Ordering (17.10s) --- PASS: TestRemoteWrite/otelcollector/Histogram (10.02s) --- PASS: TestRemoteWrite/otelcollector/Gauge (10.02s) --- PASS: TestRemoteWrite/otelcollector/Headers (10.02s) --- PASS: TestRemoteWrite/otelcollector/Retries500 (10.03s) --- PASS: TestRemoteWrite/otelcollector/Staleness (10.03s) --- PASS: TestRemoteWrite/otelcollector/Summary (10.03s) --- PASS: TestRemoteWrite/otelcollector/Timestamp (10.03s) PASS ok github.com/prometheus/compliance/remote_write 20.368s ``` --------- Co-authored-by: Anthony Mirabella <[email protected]>
The PR is merged - #23842 and this issue can be closed . |
Closing as the PR was merged. |
Component(s)
exporter/prometheusremotewrite
What happened?
Description
The prometheusremotewriteexporter is failing the conformance tests of prometheus.
Steps to Reproduce
Expected Result
The exporter should retry on 5xx errors.
Actual Result
The exporter does not retry 5xx. It is treating it in the same way as 4xx.
Collector version
this regression was inserted starting with v0.45.0
Environment information
RHEL based.
OpenTelemetry Collector configuration
NA
Log output
Additional context
NA
The text was updated successfully, but these errors were encountered: