Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken vmagent config in case of using seriesLimit in VMNodeScrape for kubelet #986

Closed
dglushenok opened this issue Jun 21, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@dglushenok
Copy link

Hello.

I'm using victoria-metrics-k8s-stack version 0.23.2, which is bundled with operator version v0.45.0.

In case of specifying seriesLimit for kubelet in values.yaml of victoria-metrics-k8s-stack like this:

kubelet:
  enabled: true

  # -- Enable scraping /metrics/cadvisor from kubelet's service
  cadvisor: true
  # -- Enable scraping /metrics/probes from kubelet's service
  probes: true
  # spec for VMNodeScrape crd
  # https://docs.victoriametrics.com/operator/api.html#vmnodescrapespec
  spec:
    scheme: "https"
    honorLabels: true
    interval: "30s"
    scrapeTimeout: "5s"
    tlsConfig:
      insecureSkipVerify: true
      caFile: "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
    bearerTokenFile: "/var/run/secrets/kubernetes.io/serviceaccount/token"
    # drop high cardinality label and useless metrics for cadvisor and kubelet
    metricRelabelConfigs:
      - action: labeldrop
        regex: (uid)
      - action: labeldrop
        regex: (id|name)
      - action: drop
        source_labels: [__name__]
        regex: (rest_client_request_duration_seconds_bucket|rest_client_request_duration_seconds_sum|rest_client_request_duration_seconds_count)
    relabelConfigs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - sourceLabels: [__metrics_path__]
        targetLabel: metrics_path
      - targetLabel: "job"
        replacement: "kubelet"
    # ignore timestamps of cadvisor's metrics by default
    # more info here https://github.com/VictoriaMetrics/VictoriaMetrics/issues/4697#issuecomment-1656540535
    honorTimestamps: false
    seriesLimit: 180000

vmagent starts to crash with following error:

2024-06-21T12:39:10.634Z        fatal   VictoriaMetrics/lib/promscrape/scraper.go:116   cannot read "/etc/vmagent/config_out/vmagent.env.yaml": cannot parse Prometheus config from "/etc/vmagent/config_out/vmagent.env.yaml": cannot unmarshal data: yaml: unmarshal errors:
  line 49861: field series_limit already set in type promscrape.ScrapeConfig
  line 49897: field series_limit already set in type promscrape.ScrapeConfig
  line 49934: field series_limit already set in type promscrape.ScrapeConfig; pass -promscrape.config.strictParse=false command-line flag for ignoring unknown fields in yaml config

/etc/vmagent/config_out/vmagent.env.yaml containes following sections with series_limit defined multiple times:

- job_name: nodeScrape/vm/vm-victoria-metrics-k8s-stack-cadvisor/0
  honor_labels: true
  honor_timestamps: false
  kubernetes_sd_configs:
  - role: node
  scrape_interval: 30s
  scrape_timeout: 5s
  metrics_path: /metrics/cadvisor
  series_limit: 180000
  scheme: https
  tls_config:
    insecure_skip_verify: true
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - source_labels:
    - __meta_kubernetes_node_name
    target_label: node
  - target_label: job
    replacement: vm/vm-victoria-metrics-k8s-stack-cadvisor
  - regex: __meta_kubernetes_node_label_(.+)
    action: labelmap
  - source_labels:
    - __metrics_path__
    target_label: metrics_path
  - target_label: job
    replacement: kubelet
  series_limit: 180000
  metric_relabel_configs:
  - regex: (uid)
    action: labeldrop
  - regex: (id|name)
    action: labeldrop
  - source_labels:
    - __name__
    regex: (rest_client_request_duration_seconds_bucket|rest_client_request_duration_seconds_sum|rest_client_request_duration_seconds_count)
    action: drop
- job_name: nodeScrape/vm/vm-victoria-metrics-k8s-stack-kubelet/1
  honor_labels: true
  honor_timestamps: false
  kubernetes_sd_configs:
  - role: node
  scrape_interval: 30s
  scrape_timeout: 5s
  series_limit: 180000
  scheme: https
  tls_config:
    insecure_skip_verify: true
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - source_labels:
    - __meta_kubernetes_node_name
    target_label: node
  - target_label: job
    replacement: vm/vm-victoria-metrics-k8s-stack-kubelet
  - regex: __meta_kubernetes_node_label_(.+)
    action: labelmap
  - source_labels:
    - __metrics_path__
    target_label: metrics_path
  - target_label: job
    replacement: kubelet
  series_limit: 180000
  metric_relabel_configs:
  - regex: (uid)
    action: labeldrop
  - regex: (id|name)
    action: labeldrop
  - source_labels:
    - __name__
    regex: (rest_client_request_duration_seconds_bucket|rest_client_request_duration_seconds_sum|rest_client_request_duration_seconds_count)
    action: drop
- job_name: nodeScrape/vm/vm-victoria-metrics-k8s-stack-probes/2
  honor_labels: true
  honor_timestamps: false
  kubernetes_sd_configs:
  - role: node
  scrape_interval: 30s
  scrape_timeout: 5s
  metrics_path: /metrics/probes
  series_limit: 180000
  scheme: https
  tls_config:
    insecure_skip_verify: true
    ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
  bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
  relabel_configs:
  - source_labels:
    - __meta_kubernetes_node_name
    target_label: node
  - target_label: job
    replacement: vm/vm-victoria-metrics-k8s-stack-probes
  - regex: __meta_kubernetes_node_label_(.+)
    action: labelmap
  - source_labels:
    - __metrics_path__
    target_label: metrics_path
  - target_label: job
    replacement: kubelet
  series_limit: 180000
  metric_relabel_configs:
  - regex: (uid)
    action: labeldrop
  - regex: (id|name)
    action: labeldrop
  - source_labels:
    - __name__
    regex: (rest_client_request_duration_seconds_bucket|rest_client_request_duration_seconds_sum|rest_client_request_duration_seconds_count)
    action: drop

This looks like a bug.

@Haleygo
Copy link
Contributor

Haleygo commented Jul 4, 2024

The fix was included in v0.46.1, close as completed.
Feel free to reopen if there are further questions.

@Haleygo Haleygo closed this as completed Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants