Skip to content

Commit

Permalink
operator: adds new flag controller.statusLastUpdateTimeTTL
Browse files Browse the repository at this point in the history
It controlls expiration time of status.condition lastUpdateTime. Which is needed to track stale parent objects.
Increasing value of this flag reduces load on Kubernetes cluster, but it also increases time of stale object detection.

 For instance, if there are 2 VMAlert objects and it matches some VMRule. Both vmalerts will be registered at VMRule.status.conditions[].type with it's name.
In case when 1 of VMAlert objects were deleted, it will be removed from VMRule.status.condition only after 3*controller.statusLastUpdateTimeTTL. Which take up to 3 hours
with default values.

 Related issue:
#1220

Signed-off-by: f41gh7 <[email protected]>
  • Loading branch information
f41gh7 committed Jan 20, 2025
1 parent fe45b61 commit d74019b
Show file tree
Hide file tree
Showing 3 changed files with 21 additions and 11 deletions.
1 change: 1 addition & 0 deletions docs/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ aliases:
* FEATURE: [vmoperator](https://docs.victoriametrics.com/operator/): decrease latency of generated configuration updates. Previously, configuration was update after status of child objects were changed. It could take significant time at large scale. See [this issue](https://github.com/VictoriaMetrics/operator/issues/1220) for details.
* FEATURE: [vmoperator](https://docs.victoriametrics.com/operator/): reduce load on Kubernetes API server at prometheus-converter client.
* FEATURE: [vmoperator](https://docs.victoriametrics.com/operator/): change default value for `client.qps=50` and `client.burst=100` in order to improve operator performance on scale. See [this issue](https://github.com/VictoriaMetrics/operator/issues/1220) for details.
* FEATURE: [vmoperator](https://docs.victoriametrics.com/operator/): add new flag `controller.statusLastUpdateTimeTTL=1h` to control staleness detection at `status.conditions` field. If operator serves large amount of object ( > 5_000) value for it should be increased.

* BUGFIX: [vmagent](https://docs.victoriametrics.com/operator/resources/vmagent/): properly build `relabelConfigs` with empty string values for `separator` and `replacement` fields. See [this issue](https://github.com/VictoriaMetrics/operator/issues/1214) for details.
* BUGFIX: [converter]((https://docs.victoriametrics.com/operator/migration/#objects-conversion)): properly format `regex` single value expression at Prometheus Operator CRD `relabelings` and `metricsRelabelings`. See [this issue](https://github.com/VictoriaMetrics/operator/issues/1219) for details.
Expand Down
27 changes: 17 additions & 10 deletions internal/controller/operator/factory/reconcile/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,18 @@ import (
"github.com/VictoriaMetrics/operator/internal/controller/operator/factory/logger"
)

const (
// TODO: @f41gh7 make configurable
statusUpdateTTL = 7 * time.Minute
statusExpireTTL = 20 * time.Minute
var (
statusUpdateTTL = 60 * time.Minute
statusExpireTTL = 3 * statusUpdateTTL
)

// SetStatusUpdateTTL configures TTL for LastUpdateTime field
//
// Higher value decreases load on Kubernetes API server
func SetStatusUpdateTTL(v time.Duration) {
statusExpireTTL = v
}

type objectWithStatus interface {
client.Object
GetStatusMetadata() *vmv1beta1.StatusMetadata
Expand All @@ -46,8 +52,10 @@ func StatusForChildObjects[T any, PT interface {
panic(fmt.Sprintf("BUG: unexpected format for parentObjectName=%q, want name.namespace.resource", parentObjectName))
}
typeName := parentObjectName + vmv1beta1.ConditionDomainTypeAppliedSuffix
ctm := metav1.Now()
for _, childObject := range childObjects {
// update current time on each cycle
// due to possible throttling at API server
ctm := metav1.Now()
st := childObject.GetStatusMetadata()
currCound := vmv1beta1.Condition{
Type: typeName,
Expand Down Expand Up @@ -106,8 +114,7 @@ func setConditionTo(dst []vmv1beta1.Condition, cond vmv1beta1.Condition) []vmv1b
// update TTL with jitter in order to reduce load on kubernetes API server
// jitter should cover configured resync period (60s default value)
// it also reduce propbability of concurrent update requests
jitter := jitterForDuration(2 * time.Minute)
ttl := statusUpdateTTL + jitter
ttl := jitterForDuration(statusUpdateTTL)
for idx, c := range dst {
if c.Type == cond.Type {
var forceLastTimeUpdate bool
Expand Down Expand Up @@ -135,8 +142,7 @@ func removeStaleConditionsBySuffix(src []vmv1beta1.Condition, domainTypeSuffix s
// update TTL with jitter in order to reduce load on kubernetes API server
// jitter should cover configured resync period (60s default value)
// it also reduce propbability of concurrent update requests
jitter := jitterForDuration(3 * time.Minute)
ttl := statusExpireTTL + jitter
ttl := statusExpireTTL + jitterForDuration(statusUpdateTTL)
for _, cond := range src {
if strings.HasSuffix(cond.Type, domainTypeSuffix) {
if time.Since(cond.LastUpdateTime.Time) > ttl {
Expand Down Expand Up @@ -164,8 +170,9 @@ func writeAggregatedStatus(stm *vmv1beta1.StatusMetadata, domainTypeSuffix strin
}
}

// adds 50% jitter to the given duration
func jitterForDuration(d time.Duration) time.Duration {
dv := d / 2
p := float64(rand.Uint32()) / (1 << 32)
return time.Duration(p * float64(dv))
return d + time.Duration(p*float64(dv))
}
4 changes: 3 additions & 1 deletion internal/manager/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,8 @@ var (
loggerJSONFields = managerFlags.String("loggerJSONFields", "", "Allows renaming fields in JSON formatted logs"+
`Example: "ts:timestamp,msg:message" renames "ts" to "timestamp" and "msg" to "message".`+
"Supported fields: ts, level, caller, msg")
statusUpdateTTL = managerFlags.Duration("controller.statusLastUpdateTimeTTL", time.Hour, "Configures TTL for LastUpdateTime status.condtions fields. "+
"It's used to detect stale parent objects on child objects. Like VMAlert->VMRule .status.Conditions.Type")
)

func init() {
Expand Down Expand Up @@ -183,7 +185,7 @@ func RunManager(ctx context.Context) error {
}

reconcile.InitDeadlines(baseConfig.PodWaitReadyIntervalCheck, baseConfig.AppReadyTimeout, baseConfig.PodWaitReadyTimeout)

reconcile.SetStatusUpdateTTL(*statusUpdateTTL)
config := ctrl.GetConfigOrDie()
config.RateLimiter = flowcontrol.NewTokenBucketRateLimiter(float32(*clientQPS), *clientBurst)

Expand Down

0 comments on commit d74019b

Please sign in to comment.