Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLP/GRPC exporter implementation - Missing retries on Exporter errors #328

Closed
bobstrecansky opened this issue May 5, 2021 · 7 comments · Fixed by #838
Closed

OTLP/GRPC exporter implementation - Missing retries on Exporter errors #328

bobstrecansky opened this issue May 5, 2021 · 7 comments · Fixed by #838
Assignees
Labels
bite sized This is a small chunk of work (good for new or time limited contributors!) enhancement New feature or request good first issue Good for newcomers help wanted This issue is looking for someone to work on it pinned Will not be removed by stalebot

Comments

@bobstrecansky
Copy link
Collaborator

We're missing retries on Exporter errors: https://github.com/open-telemetry/opentelemetry-specification/blob/318745aaf6fb173eb5cbc887ab76796bd817b3d6/specification/protocol/exporter.md#retry

@bobstrecansky bobstrecansky added enhancement New feature or request help wanted This issue is looking for someone to work on it release:required-for-ga labels May 5, 2021
@bobstrecansky bobstrecansky added bite sized This is a small chunk of work (good for new or time limited contributors!) good first issue Good for newcomers labels Jun 10, 2021
@Grunet
Copy link
Contributor

Grunet commented Sep 16, 2021

Here are some notes from my quick look at the current state of other client SDK implementations around this:

  • Go is the only one I see evidence of having tried implementing this, but possibly only partially (i.e. maybe just for gRPC/http and traces?)
  • Java is waiting on further clarification around the spec before starting, in particular around how the backoff parameters should be made adjustable by consumers
  • .NET's issue for this only seems to discuss gRPC support

And for PHP, I'm not sure how backoff strategies typically get implemented (in particular for a library that needs to be very generically reusable in various different frameworks, e.g. the reactive ones I've heard about) so if anyone happens to know of a really good example of this from somewhere else on the internet, feel free to link to it here

@tidal
Copy link
Member

tidal commented Dec 8, 2021

This is not feasible as a generic feature of the exporter, as it pretty much depends on the runtime, if this will create problems or not. So if anything this feature has to be turned on explicitly by the user/instrumentation.
The problems can occure mainly (but not only) in PHP-FPM and traditional web servers, as there PHP is normally executed in a separate single-threaded, share-nothing process, and this process is expected be quick & dirty clean. Or in other words request termination (RSHUTDOWN) should not be defered (especially not because of cross-cutting concerns) , as this will keep the connection to the client open. In the best case this results in the loading spinner in the browser keep spinning while the page is already "loaded", but in the worst case the process may exeed the configured max-execution-time and create an error with possible side-effects.

For non-traditional runtimes this is less of a problems, in particular not for the event-based ones. Especially Swoole and Swoft can handle back-offs with ease, as they can be executed in coroutines, while pure event-based runtimes (ReactPHP, AMP, etc.) can only defer execution like JavaScript.

To cut a long story short. To have a safe implementation of this feature for all runtimes, the retry must not happen in the same place, where the initial try happened. So retries have to go into a queue upon which a daemon or agent will act. Since classic queues can also possible fail because of networks errors, etc, ZeroMQ could be used as a broker-less buffered transport mechanism to guarantee delivery.

btw: I think the exporter is blocking on the grpc request. Is there any way to change this to be asynchronous?

@tidal
Copy link
Member

tidal commented Dec 14, 2021

@bobstrecansky
Could we settle on making retries a configurable option (default=off)? As outlined above, I don't see how we can implement this otherwise without creating complaints/bug-tickets of SDK users in the future.
On the other hand we could use this setting for other exporters as well.

@bobstrecansky
Copy link
Collaborator Author

After discussing this in the SIG meeting: Yes, this sounds reasonable to me.

@stale
Copy link

stale bot commented Feb 18, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Feb 18, 2022
@stale
Copy link

stale bot commented Feb 27, 2022

This issue has been automatically closed because it has not had recent activity, but it can be reopened. Thank you for your contributions.

@stale stale bot closed this as completed Feb 27, 2022
@tidal tidal reopened this Feb 27, 2022
@stale stale bot removed the wontfix This will not be worked on label Feb 27, 2022
@tidal tidal added the pinned Will not be removed by stalebot label Feb 27, 2022
@sanketmehta28
Copy link
Member

Hi @tidal and @bobstrecansky : Can you assign this to me?

sanketmehta28 added a commit to sanketmehta28/opentelemetry-php that referenced this issue May 19, 2022
…entation - Missing retries on Exporter errors
@brettmc brettmc linked a pull request Oct 7, 2022 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bite sized This is a small chunk of work (good for new or time limited contributors!) enhancement New feature or request good first issue Good for newcomers help wanted This issue is looking for someone to work on it pinned Will not be removed by stalebot
Projects
None yet
4 participants