Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collector Reliability Review #11499

Open
yurishkuro opened this issue Oct 21, 2024 · 6 comments
Open

Collector Reliability Review #11499

yurishkuro opened this issue Oct 21, 2024 · 6 comments
Assignees

Comments

@yurishkuro
Copy link
Member

As Collector is moving towards the 1.0 GA milestone, the Technical Committee is recommending to conduct a reliability review of the collector.

Motivation

  • The TC is formally accountable for the quality of the software produced by the OpenTelemetry project. Similar to the TC’s due diligence conducted for 1.0 milestones for the language SIGs, this reliability review is a way for the TC to conduct a comprehensive overview of the Collector’s architecture, its expected behavior in production under stress conditions, and to provide feedback to the maintainers.
  • OTel is applying for graduation at CNCF. Part of the application process is a similar due diligence by the CNCF TOC, so this internal review will better prepare the project for the graduation.

Process

Reliability review is a process commonly accepted at big tech companies for new systems / big milestones. In involves the following steps:

  1. The organization prepares a questionnaire template used for such reviews. The TC recommends this template.
  2. The Collector maintainers fill out the questionnaire async. “Not possible” or “not implemented” are acceptable answers, the objective is to have an honest reflection of the current state.
  3. The TC members review the questionnaire async, asking for clarifications. The objective is to ensure the specific concern of each question is discussed by the maintainers.
  4. The TC and the maintainers meet for a sync discussion.

Expected Outcomes

  • The final report is published as part of the GA readiness documentation that informs the users of Collector’s expected behavior in production.
  • Potentially a set of documentation tasks (maybe creating playbooks)
  • Potentially re-prioritizing some components/capabilities not otherwise in the 1.0 scope
@jmacd
Copy link
Contributor

jmacd commented Nov 1, 2024

I will share my response to the review questions here.

My aim in answering these questions is to gain support for two major issues which are limiting factors for reliability.

#11183
#11308

@mx-psi
Copy link
Member

mx-psi commented Nov 4, 2024

Apologies since I did not update this issue with the work we are doing. The maintainers are working on a draft answering the template shared above. We also discussed working on the following as part of the 1.0 effort:

Note that the current effort is in releasing a 1.0 version of the OTLP-only distribution as described on the blog announcement and GA roadmap, and the related components (OTLP receiver, OTLP exporters). We have not made any written commitments regarding other distributions or other components.

We will share the filled template when it's ready. We may schedule a meeting with the TC to discuss the template and next steps.

@mx-psi
Copy link
Member

mx-psi commented Nov 8, 2024

Here is the filled template from the Collector maintainers, reviewed by the Collector stability WG: https://docs.google.com/document/d/1BO7hMg0K7zFQ0219z61yRj-2JMMTW_HfoX3Vt6kXW8E/edit?tab=t.0

@codeboten
Copy link
Contributor

Looking for some input from @open-telemetry/technical-committee on the document linked above by @mx-psi

@mx-psi mx-psi moved this from Blocked to In Progress in Collector: v1 Dec 11, 2024
@mx-psi mx-psi moved this from In Progress to Waiting for reviews in Collector: v1 Dec 16, 2024
@mx-psi
Copy link
Member

mx-psi commented Jan 14, 2025

We met with a representative of the Technical Committee on 2025-01-13 to finished discussing all comments on the questionnaire. I left notes on each of the comments and I will be creating issues shortly for each of them. Ultimately, we agreed on improving documentation for these items (in parallel but not necessarily before 1.0), with no changes to the Collector 1.0 roadmap.

To make this plan official, I would personally like to request a public message on behalf of the TC on this issue either finalizing the conversation or making any concrete requests they have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for reviews
Development

No branches or pull requests

6 participants