Gathered hostmetrics process shown in console but not as metric in prometheus #36496

securom1987 · 2024-11-22T08:49:13Z

Component(s)

receiver/hostmetrics

What happened?

Description

As mentioned in description i am using otel collector v0.114 and hostmetrics receiver with processscraper in ubuntu linux.
I want to scrape process information. These are shown in debug output / console for example:

Console output

Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> process.owner: Str(root)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: InstrumentationScope github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper 0.114.0
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> Name: process.cpu.time
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> Name: process.memory.usage
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> Name: process.memory.virtual
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> process.pid: Int(616072)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> process.parent_pid: Int(1)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> process.executable.name: Str(loki)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> process.executable.path: Str()
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> process.command: Str(/usr/bin/loki)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> process.command_line: Str(/usr/bin/loki -config.file /etc/loki/config.yml)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: -> process.owner: Str(loki)

based on this otel-collector config:

extensions:
health_check:
endpoint: 0.0.0.0:1133

receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
hostmetrics:
collection_interval: 10s
scrapers:
# CPU utilization metrics
#cpu:
# Disk I/O metrics
# disk:
# File System utilization metrics
#filesystem:
# CPU load metrics
#load:
# Memory utilization metrics
#memory:
# Network interface I/O metrics & TCP connection metrics
#network:
# Paging/Swap space utilization and I/O metrics
#paging:
# Process count metrics
process:
# Per process CPU, Memory, and Disk I/O metrics
processes:

processors:
batch:
resource:
attributes:
- action: insert
key: service.name ## setzt im Grafana in der Metrik die job=HOST1
value: NUC-CLOUD

exporters:
debug:
verbosity: detailed
prometheus:
endpoint: 0.0.0.0:8889

service:
extensions: [health_check]
pipelines:
metrics:
receivers: [otlp, hostmetrics]
processors: [resource, batch]
exporters: [debug, prometheus]

The problem:

The metrics which are written to console are not shown in prometheus.

Collector version

v0.114.0

Environment information

Environment

OS: (e.g., "Ubuntu 24.04")
Compiler(if manually compiled): (e.g., "go 14.2")

OpenTelemetry Collector configuration

extensions:
  health_check:
    endpoint: 0.0.0.0:1133

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  hostmetrics:
    collection_interval: 10s
    scrapers:
      # CPU utilization metrics
      #cpu:
      # Disk I/O metrics
      # disk:
      # File System utilization metrics
      #filesystem:
      # CPU load metrics
      #load:
      # Memory utilization metrics
      #memory:
      # Network interface I/O metrics & TCP connection metrics
      #network:
      # Paging/Swap space utilization and I/O metrics
      #paging:
      # Process count metrics
      process:
      # Per process CPU, Memory, and Disk I/O metrics
      processes:

processors:
  batch:
  resource:
    attributes:
      - action: insert
        key: service.name           ## setzt im Grafana in der Metrik die job=HOST1
        value: NUC-CLOUD

exporters:
  debug:
    verbosity: detailed
  prometheus:
    endpoint: 0.0.0.0:8889

service:
  extensions: [health_check]
  pipelines:
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [resource, batch]
      exporters: [debug, prometheus]

Log output

Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> process.owner: Str(root)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]: InstrumentationScope github.com/open-telemetry/opentelemetry-collector-contrib/receiver/hostmetricsreceiver/internal/scraper/processscraper 0.114.0
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> Name: process.cpu.time
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> Name: process.memory.usage
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> Name: process.memory.virtual
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> process.pid: Int(616072)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> process.parent_pid: Int(1)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> process.executable.name: Str(loki)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> process.executable.path: Str()
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> process.command: Str(/usr/bin/loki)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> process.command_line: Str(/usr/bin/loki -config.file /etc/loki/config.yml)
Nov 22 09:37:35 nuc-cloud otelcol-contrib[1156080]:      -> process.owner: Str(loki)

Additional context

Metrics which are shown in console/debug log are not shown in prometheus.
For example process "loki" in log output

github-actions · 2024-11-22T08:49:31Z

Pinging code owners:

receiver/hostmetrics: @dmitryax @braydonk

See Adding Labels via Comments if you do not have permissions to add labels yourself.

securom1987 · 2024-11-22T08:59:10Z

/help-wanted receiver/hostmetrics

VihasMakwana · 2024-11-22T14:35:12Z

@securom1987 do you see any errors logged from the prometheus exporter?

Can you:

Disable debug processor (as you've confirmed that metrics get logged)
Use service::telemetry::logs::level: info and see if you get any hints?

tdg5 · 2024-11-22T15:32:15Z

@securom1987, the format of metric names in OTLP disagrees with the format of metric names in prometheus.

I have no experience with the prometheus exporter, but the prometheusremotewrite exporter has a config that handles translating the OTLP metric names to prometheus friendly names, so you might try the prometheusremotewrite exporter instead. Alternatively, you could look for a similar configuration on the prometheus exporter.

It's clunkier, but this workaround would probably also work if you don't mind having to explicitly map each metric tag.

securom1987 · 2024-11-25T09:41:29Z

@securom1987 do you see any errors logged from the prometheus exporter?

Can you:
* Disable `debug` processor (as you've confirmed that metrics get logged)

* Use `service::telemetry::logs::level: info` and see if you get any hints?

Hi, thank you for your reply:

Here is the output written to syslog:
Nov 25 10:24:57 nuc-cloud otelcol-contrib[1376848]: 2024-11-25T10:24:57.173+0100#011error#011scraperhelper/scrapercontroller.go:206#011
Error scraping metrics#011{"kind": "receiver", "name": "hostmetrics", "data_type": "metrics", "error":
"error reading process executable for pid 1: readlink /proc/1/exe: permission denied;
error reading process executable for pid 2: readlink /proc/2/exe: permission denied;
error reading process executable for pid 3: readlink /proc/3/exe: permission denied;
error reading process executable for pid 4: readlink /proc/4/exe: permission denied;
error reading process executable for pid 5: readlink /proc/5/exe: permission denied;
error reading process executable for pid 6: readlink /proc/6/exe: permission denied;
error reading process executable for pid 8: readlink /proc/8/exe: permission denied;
....
error reading process executable for pid 231: readlink /proc/231/exe: permission denied;
Nov 25 10:24:57 nuc-cloud otelcol-contrib[1376848]: go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
Nov 25 10:24:57 nuc-cloud otelcol-contrib[1376848]: #011go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:206
Nov 25 10:24:57 nuc-cloud otelcol-contrib[1376848]: go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
Nov 25 10:24:57 nuc-cloud otelcol-contrib[1376848]: #011go.opentelemetry.io/collector/[email protected]/scraperhelper/scrapercontroller.go:183

Only processes from /proc can not be scraped in my opinion...
Switching back on the debug exporter and enabling it, other user processes can be scraped as seen on the console output.

github-actions · 2024-11-25T09:41:52Z

Pinging code owners for exporter/prometheus: @Aneurysm9 @dashpole @ArthurSens. See Adding Labels via Comments if you do not have permissions to add labels yourself. For example, comment '/label priority:p2 -needs-triaged' to set the priority and remove the needs-triaged label.

braydonk · 2024-11-25T13:39:15Z

Can you send an example of the Prometheus scrape? Is it empty?

securom1987 · 2024-11-25T13:46:03Z

Can you send an example of the Prometheus scrape? Is it empty?

Do you mean its config file?

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: "otel-collector-Gateway"
    scrape_interval: 5s
    honor_labels: true
    static_configs:
      - targets: ["localhost:8889"]`

braydonk · 2024-11-25T14:00:02Z

I did not mean the scrape config, though good to know anyway. What I meant was if you curl the prometheus endpoint you started with the collector, what is the output?

securom1987 · 2024-11-25T14:03:44Z

After curling prometheus remote write endpoint: "curl -X POST http://nuc-cloud:9090/api/v1/write" i get "snappy: corrupt input" as an answer. So it should work but think i have to translate the otlp metrics to prometheus friendly metrics as tdg5 already mentioned. But i have no clou how to do that.

braydonk · 2024-11-25T14:12:06Z

I am referring to the otel-collector-Gateway target in your scrape config. In your OpenTelemetry Collector configuration from your initial issue comment, your Prometheus exporter starts on localhost:8889. I'm not sure how your networking is set up, but if I was going to get the metrics from this exporter locally I would do this:

curl http://localhost:8889/metrics

The localhost may need to be a different host name depending on your setup. But what I want to know primarily is how you know the metrics aren't showing up, which is why I'd like to see the raw output of that Prometheus exporter.

securom1987 · 2024-11-25T14:20:07Z

Otel Collector and Prometheus are running on the same host.
Thats the result of curling my otel collector metrics according to my initial config:

curl http://nuc-cloud:8889/metrics
# HELP process_cpu_time_seconds_total Total CPU seconds broken down by different states.
# TYPE process_cpu_time_seconds_total counter
process_cpu_time_seconds_total{job="NUC-CLOUD",state="system"} 0
process_cpu_time_seconds_total{job="NUC-CLOUD",state="user"} 0
process_cpu_time_seconds_total{job="NUC-CLOUD",state="wait"} 0
# HELP process_disk_io_bytes_total Disk bytes transferred.
# TYPE process_disk_io_bytes_total counter
process_disk_io_bytes_total{direction="read",job="NUC-CLOUD"} 1.32905e+06
process_disk_io_bytes_total{direction="write",job="NUC-CLOUD"} 3807
# HELP process_memory_usage_bytes The amount of physical memory in use.
# TYPE process_memory_usage_bytes gauge
process_memory_usage_bytes{job="NUC-CLOUD"} 1.077248e+06
# HELP process_memory_virtual_bytes Virtual memory size.
# TYPE process_memory_virtual_bytes gauge
process_memory_virtual_bytes{job="NUC-CLOUD"} 5.873664e+06
# HELP system_processes_count Total number of processes in each state.
# TYPE system_processes_count gauge
system_processes_count{job="NUC-CLOUD",status="blocked"} 0
system_processes_count{job="NUC-CLOUD",status="idle"} 76
system_processes_count{job="NUC-CLOUD",status="running"} 1
system_processes_count{job="NUC-CLOUD",status="sleeping"} 137
# HELP system_processes_created_total Total number of created processes.
# TYPE system_processes_created_total counter
system_processes_created_total{job="NUC-CLOUD"} 1.471905e+06

braydonk · 2024-11-25T15:47:03Z

I understand the problem now.

The process scraper structures the process metrics as a collection of Resources, with the attributes that identify the process going in the resource, and the metrics under that resource being the actual metrics for that process. The prometheus exporter handles resources in a very specific way by default that isn't compatible with the way most of the metrics produced by the hostmetrics receiver are structured.

The prometheus exporter has a config option called resource_to_telemetry_conversion that will flatten all the resource attributes into each metric itself. This will have the effect you're after.

Try changing your prometheus exporter config to the following:

prometheus:
  resource_to_telemetry_conversion:
    enabled: true
  endpoint: 0.0.0.0:8889

For reference, I used this config locally to verify:

receivers:
  hostmetrics:
    scrapers:
      process:

exporters:
  prometheus:
    resource_to_telemetry_conversion:
      enabled: true
    endpoint: "localhost:9090"

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      exporters: [prometheus]

ArthurSens · 2024-11-25T19:13:19Z

@braydonk's suggestion should do the trick! Another approach, if you prefer, is to send OTLP directly to Prometheus: https://prometheus.io/docs/guides/opentelemetry/

If you go in that direction, you'll want to take a look at promote_resource_attributes which will transform specific resources into Prometheus labels

securom1987 · 2024-11-26T07:45:42Z

I understand the problem now.

The process scraper structures the process metrics as a collection of Resources, with the attributes that identify the process going in the resource, and the metrics under that resource being the actual metrics for that process. The prometheus exporter handles resources in a very specific way by default that isn't compatible with the way most of the metrics produced by the hostmetrics receiver are structured.

The prometheus exporter has a config option called resource_to_telemetry_conversion that will flatten all the resource attributes into each metric itself. This will have the effect you're after.

Try changing your prometheus exporter config to the following:
prometheus:
  resource_to_telemetry_conversion:
    enabled: true
  endpoint: 0.0.0.0:8889
For reference, I used this config locally to verify:
receivers:
  hostmetrics:
    scrapers:
      process:

exporters:
  prometheus:
    resource_to_telemetry_conversion:
      enabled: true
    endpoint: "localhost:9090"

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      exporters: [prometheus]

I tried this configuration also with the prometheusremotewrite exporter before and i worked very well, so the addition of

resource_to_telemetry_conversion
  enabled: true

did the trick!
Also in combination with prometheus exporter, it works for me too. So i will stick to prometheus exporter only and drop prometheusremotewrite.

So following configuration is working nearly the same

exporters:
  debug:                        
    verbosity: detailed
  prometheus:                   
    resource_to_telemetry_conversion:
      enabled: true
    endpoint: 0.0.0.0:8889
  #prometheusremotewrite:
    #endpoint: "http://nuc-cloud:9090/api/v1/write"
    #resource_to_telemetry_conversion:
      #enabled: true

Thank you for your help!

securom1987 · 2024-11-26T07:46:22Z

Works as described in last comment

securom1987 added bug Something isn't working needs triage New item requiring triage labels Nov 22, 2024

github-actions bot added the receiver/hostmetrics label Nov 22, 2024

ChrsMark added the exporter/prometheus label Nov 25, 2024

VihasMakwana removed the needs triage New item requiring triage label Nov 25, 2024

github-actions bot mentioned this issue Nov 26, 2024

Weekly Report: 2024-11-19 - 2024-11-26 #36533

Closed

securom1987 closed this as completed Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gathered hostmetrics process shown in console but not as metric in prometheus #36496

Gathered hostmetrics process shown in console but not as metric in prometheus #36496

securom1987 commented Nov 22, 2024

github-actions bot commented Nov 22, 2024

securom1987 commented Nov 22, 2024 •

edited

Loading

VihasMakwana commented Nov 22, 2024

tdg5 commented Nov 22, 2024

securom1987 commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

braydonk commented Nov 25, 2024

securom1987 commented Nov 25, 2024 •

edited

Loading

braydonk commented Nov 25, 2024

securom1987 commented Nov 25, 2024 •

edited

Loading

braydonk commented Nov 25, 2024

securom1987 commented Nov 25, 2024

braydonk commented Nov 25, 2024

ArthurSens commented Nov 25, 2024

securom1987 commented Nov 26, 2024

securom1987 commented Nov 26, 2024

Gathered hostmetrics process shown in console but not as metric in prometheus #36496

Gathered hostmetrics process shown in console but not as metric in prometheus #36496

Comments

securom1987 commented Nov 22, 2024

Component(s)

What happened?

Description

Console output

based on this otel-collector config:

The problem:

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Nov 22, 2024

securom1987 commented Nov 22, 2024 • edited Loading

VihasMakwana commented Nov 22, 2024

tdg5 commented Nov 22, 2024

securom1987 commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

braydonk commented Nov 25, 2024

securom1987 commented Nov 25, 2024 • edited Loading

braydonk commented Nov 25, 2024

securom1987 commented Nov 25, 2024 • edited Loading

braydonk commented Nov 25, 2024

securom1987 commented Nov 25, 2024

braydonk commented Nov 25, 2024

ArthurSens commented Nov 25, 2024

securom1987 commented Nov 26, 2024

securom1987 commented Nov 26, 2024

securom1987 commented Nov 22, 2024 •

edited

Loading

securom1987 commented Nov 25, 2024 •

edited

Loading

securom1987 commented Nov 25, 2024 •

edited

Loading