[Core feature] Integrate open telemetry into Flyte components #3304
Labels
backlogged
For internal use. Reserved for contributor team workflow.
enhancement
New feature or request
exo
Milestone
Motivation: Why do you think this is important?
OpenTelemetry is a distributed tracing framework designed to ease performance analyses in distributed systems. Inline with our performance observability push, this would provide users a more conclusive understanding of Flyte performance. Additionally, it helps debug performance issues and serves as a benchmarking utility for new features.
Goal: What should the final outcome look like, ideally?
OpenTelmetry offers many opportunities for instrumentation. We hope to add support for:
Describe alternatives you've considered
We have considered two main options:
(1) Leaving this as they are: The current state may leave users (or developers) frustrated about system performance with no real explanation.
(2) Enhancing prometheus metrics: Flyte currently exposes many metrics through prometheus, however these metics are often aggregations where fine-grained analysis at the workflow / node / or task level is unavailable.
Propose: Link/Inline OR Additional context
This work is described as "orchestration metrics" in the performance observability RFC.
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: