-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/datadog] Log events are lost in case of network issues without retry #24550
Comments
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
…make sure log events are delivered reliably (open-telemetry#24550).
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself. |
Hello @anmalysh-yb, it looks like this was intended from the change here: #16390. I'll defer to the code owners to confirm if this was intended or not. Note: Looks like another user is running into the same issue, maybe this is an unintended consequence. |
@dineshg13 can this be closed or is there work left to do? |
@mx-psi I think this is a real issue actually and should be fixed. |
…make sure log events are delivered reliably (open-telemetry#24550).
Hey @mx-psi @anmalysh-yb as users of Datadog Exporter, we are also facing log records lost because of blips in a network. I believe this is a regression issue and needs to be fixed. How can I help move this issue further? |
👋 #27450 is expected to fix the issue. |
Hey @songy23 thanks for the update. The referenced PR is quite a heavy refactoring. How feasible do you think we could fix this issue within the current approach of logs pushing without waiting for the refactoring will be released? I assume it might be a fairly quick/light fix (I am happy to create a PR). Or perhaps you could suggest a potential workaround to prevent potential data loss while waiting for the refactoring to be in the main? |
@siarhei-kharchanka-cko It would be awesome if you can send a quick fix! Context: this issue is only with HTTP log client that datadog log exporter is currently using. #27450 migrates the datadog log exporter from HTTP log client to logs agent. There is built-in retries in logs agent. |
Thank you @songy23! I've linked PR |
…esponse (#28672) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> The Datadog exporter threats network/connectivity errors (HTTP client doesn't receive a response) as permanent errors, which can lead to log records loss. This change makes these errors retryable. **Link to tracking Issue:** #24550 **Testing:** <Describe what testing was performed and which tests were added.> **Documentation:** <Describe the documentation added.>
…esponse (open-telemetry#28672) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> The Datadog exporter threats network/connectivity errors (HTTP client doesn't receive a response) as permanent errors, which can lead to log records loss. This change makes these errors retryable. **Link to tracking Issue:** open-telemetry#24550 **Testing:** <Describe what testing was performed and which tests were added.> **Documentation:** <Describe the documentation added.>
…esponse (open-telemetry#28672) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> The Datadog exporter threats network/connectivity errors (HTTP client doesn't receive a response) as permanent errors, which can lead to log records loss. This change makes these errors retryable. **Link to tracking Issue:** open-telemetry#24550 **Testing:** <Describe what testing was performed and which tests were added.> **Documentation:** <Describe the documentation added.>
Component(s)
exporter/datadog
What happened?
Currently datadog exporter treats network error as permanent error and does not queue log event for retry, because of that.
Probably introduced with this diff: a21144b
Steps to Reproduce
Configure collector as shown in the ticket.
Start writing log.
Turn off network connection.
Turn network connection back on after some time.
Expected Result
All the logs are exported to datadog.
Actual Result
Logs written during network outage are lost.
Collector version
0.81.0
Environment information
Environment
OS: MacOS
Compiler(if manually compiled): N/A
OpenTelemetry Collector configuration
Log output
Additional context
No response
The text was updated successfully, but these errors were encountered: