Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing scale subresource status in order to use an HPA resource over an OpenTelemetryCollector CR #775

Closed
secat opened this issue Mar 16, 2022 · 8 comments · Fixed by #785
Assignees
Labels
area:collector Issues for deploying collector

Comments

@secat
Copy link
Contributor

secat commented Mar 16, 2022

The OpenTelemetryCollector has a configuration field named replicas (see openTelemetryCollector.spec.replicas). However it lacks some status info in order to configure an HPA on a it (see scale subresource documentation) using the defined scale subresource in the OpenTelemetryCollector CRD.

Suggestion

I suggest to update the OpenTelemetryCollector CRD status with:

// ScaleSubresourceStatus defines the observed state of the OpenTelemetryCollector's
// scale subresource.
type ScaleSubresourceStatus struct {
	// The total number of non-terminated pods targeted by this
	// OpenTelemetryCollector's deployment or statefulSet.
	// +optional
	Replicas int32 `json:"replicas,omitempty"`

	// The selector used to match the OpenTelemetryCollector's
	// deployment or statefulSet pods.
	// +optional
	Selector string `json:"selector,omitempty"`
}

// OpenTelemetryCollectorStatus defines the observed state of OpenTelemetryCollector.
type OpenTelemetryCollectorStatus struct {
	[...]

	// Scale is the OpenTelemetryCollector's scale subresource status.
	// +optional
	Scale ScaleSubresourceStatus `json:"scale,omitempty"`

	[...]
}

// +kubebuilder:object:root=true
// +kubebuilder:resource:shortName=otelcol;otelcols
// +kubebuilder:subresource:status 
// +kubebuilder:subresource:scale:specpath=.spec.replicas,statuspath=.status.scale.replicas,selectorpath=.status.scale.selector
[...]

NOTE: The work that was done in PR #746 doesn't fulfill our needs. We need to scale based on the memory consumption. We may also in the future configure scaling based on a custom metric. We also want to configure our own desired scaling behavior.

@secat
Copy link
Contributor Author

secat commented Mar 16, 2022

@jpkrohling I would be available to contribute a PR for this issue. Thank you in advance.

@pavolloffay pavolloffay added the area:collector Issues for deploying collector label Mar 16, 2022
@secat secat changed the title Missing scale subresource status in order to use an HPA resource over an OpenTelemetryCollector CR Add missing scale subresource status in order to use an HPA resource over an OpenTelemetryCollector CR Mar 16, 2022
@jpkrohling
Copy link
Member

It's yours!

@pavolloffay
Copy link
Member

pavolloffay commented Mar 21, 2022

@secat could you please explain your use-case and how this feature is different to #746 which added MaxReplicas field in the CR and it configures HPA for collector deployment if that field is used?

Could you please also explain how the /scale subresource is used (e.g. by HPA) or in your use-case? Do you create additional k8s objects to make use of it?

@secat
Copy link
Contributor Author

secat commented Mar 21, 2022

@pavolloffay as described in the note in the main description of this issue, the current implementation in #746 doesn't fulfill our needs since it creates a v1 HPA based on CPU usage only without other configuration knobs. It also doesn't provide any advanced HPA v2beta1 configurations.

We want to scale short/medium term based on Memory. Long term, we want to scale on a custom metric. We want also to configure our own scaling behavior. This is not possible with the current implementation and with the only configuration knob called MaxReplicas.

We have a meta controller that creates the OpenTelemetryCollector with a vb2beta2 HPA resource that is configured on top of the OpenTelemetryCollector. This HPA will control the OpenTelemetryCollector resource's spec.replicas configuration field and use the status.scale.replicas and status.scale.collector (see scale subresource documentation).

Also short/medium term we want to use a deployment for the otel collector, but we may in the future use a statefulSet if we start using wal. The external HPA will control the replicas field which in turns will control the deployment or statefulSet.

@pavolloffay
Copy link
Member

thanks for the explanation @secat. If you could share your HPA configuration (v2beta1) here as well that would be helpful (also for other users) to provide a complete story/guide.

@secat
Copy link
Contributor Author

secat commented Mar 22, 2022

@pavolloffay here is an example of my current HPA v2beta2 configuration:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  creationTimestamp: "2022-03-22T11:11:56Z"
  labels:
    app.kubernetes.io/component: collector
    app.kubernetes.io/instance: tracingcollectorendpoint-collector-857851ca
    app.kubernetes.io/managed-by: tracing-operator
    app.kubernetes.io/name: tce-scatudal-local-lab
    app.kubernetes.io/part-of: tracingcollectorendpoint
  name: tracingcollectorendpoint-collector-857851ca
  namespace: scatudal-local-lab
  ownerReferences:
  - apiVersion: tracing.observability.harbour.ubisoft.com/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: TracingCollectorEndpoint
    name: tce-scatudal-local-lab
    uid: c94e65b9-fb7c-420e-94c2-50f7dfd5e0e3
  resourceVersion: "266206"
  uid: dcc685bd-56e2-4a22-a0e4-cfb8a0bc9c2a
spec:
  maxReplicas: 10
  metrics:
  - resource:
      name: memory
      target:
        averageUtilization: 70
        type: Utilization
    type: Resource
  - resource:
      name: cpu
      target:
        averageUtilization: 70
        type: Utilization
    type: Resource
  minReplicas: 1
  scaleTargetRef:
    apiVersion: opentelemetry.io/v1alpha1
    kind: OpenTelemetryCollector
    name: tracingcollectorendpoint-collector-857851ca
status:
  conditions:
  - lastTransitionTime: "2022-03-22T11:12:11Z"
    message: the HPA controller was able to get the target's current scale
    reason: SucceededGetScale
    status: "True"
    type: AbleToScale
  - lastTransitionTime: "2022-03-22T11:12:11Z"
    message: the HPA target's scale is missing a selector
    reason: InvalidSelector
    status: "False"
    type: ScalingActive
  currentMetrics: null
  currentReplicas: 1
  desiredReplicas: 0

NOTE: Trick to get specifically an HPA v2beta2 using kubectl:

kubectl get hpa.v2beta2.autoscaling my-hpa

@pavolloffay
Copy link
Member

Notes about autoscaling API deprecation:

The autoscaling/v2beta2 API version of HorizontalPodAutoscaler will no longer be served in v1.26.

Migrate manifests and API clients to use the autoscaling/v2 API version, available since v1.23.
All existing persisted objects are accessible via the new API

------

The autoscaling/v2beta1 API version of HorizontalPodAutoscaler will no longer be served in v1.25.

Migrate manifests and API clients to use the autoscaling/v2 API version, available since v1.23.
All existing persisted objects are accessible via the new API

@secat
Copy link
Contributor Author

secat commented Mar 25, 2022

@pavolloffay thank you 🙏 for the heads up!

At least the scale subresource works with any HPA versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:collector Issues for deploying collector
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants