Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Stats #262

Open
eximo84 opened this issue Nov 5, 2024 · 19 comments
Open

GPU Stats #262

eximo84 opened this issue Nov 5, 2024 · 19 comments
Labels
enhancement New feature or request in progress We've started work on this

Comments

@eximo84
Copy link

eximo84 commented Nov 5, 2024

Would be nice to be able to see GPU stats using intel_gpu_top or AMD equivalent.

Running the hub in docker but agent on containers I'm yet to install the agent on the Proxmox host.

@henrygd henrygd added the enhancement New feature or request label Nov 5, 2024
@henrygd
Copy link
Owner

henrygd commented Nov 6, 2024

We may be able to pull data for Nvidia and AMD GPUs from nvidia-smi and rocm-smi.

It would probably only work with the binary version of the agent, though.

@henrygd
Copy link
Owner

henrygd commented Nov 9, 2024

Experimental support for Nvidia and AMD will be added in 0.7.4.

This works for the binary agent only and requires nvidia-smi (Nvidia) or rocm-smi (AMD) to be installed on the system.

To enable, set the environment variable GPU=true.

If you used the install script, you can do this by adding Environment="GPU=true" in the [Service] section of /etc/systemd/system/beszel-agent.service. Then reload / restart the service:

sudo systemctl daemon-reload
sudo systemctl restart beszel-agent

Any feedback is appreciated. If it works then I'll enable it by default in the next minor release.

I don't have a device using an Intel GPU, unfortunately, so I won't be able to add that.

Tip

Installing rocm-smi-lib on Arch and Debian places the rocm-smi binary in /opt/rocm. If this isn't in the PATH of the user running beszel-agent, you can fix by symlinking to /usr/local/bin:

sudo ln -s /opt/rocm/bin/rocm-smi /usr/local/bin/rocm-smi

@henrygd henrygd added the in progress We've started work on this label Nov 9, 2024
@Morethanevil
Copy link

Tested it with my GTX 1660 Super, works fine on Fedora 41 with NVIDIA drivers

Screenshot 2024-11-09 091008

Did a short conversation with my LLM :)

@eximo84
Copy link
Author

eximo84 commented Nov 9, 2024

Argh sad you can't get intel stats being most are no doubt using iGPUs 🙁

Amazing work for the Nvidia and AMD guys though.

@eximo84
Copy link
Author

eximo84 commented Nov 9, 2024

If it's useful here is a JSON output from intel_gpu_top


{
        "period": {
                "duration": 16.488048,
                "unit": "ms"
        },
        "frequency": {
                "requested": 2244.049750,
                "actual": 1273.649858,
                "unit": "MHz"
        },
        "interrupts": {
                "count": 2971.849670,
                "unit": "irq/s"
        },
        "rc6": {
                "value": 0.000000,
                "unit": "%"
        },
        "engines": {
                "Render/3D/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Blitter/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Video/0": {
                        "busy": 99.270775,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Video/1": {
                        "busy": 51.775195,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "VideoEnhance/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "VideoEnhance/1": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "[unknown]/0": {
                        "busy": 12.718176,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                }
        },
        "clients": {
                "4294005725": {
                        "name": "ffmpeg",
                        "pid": "961571",
                        "engine-classes": {
                                "Render/3D": {
                                        "busy": "0.000000",
                                        "unit": "%"
                                },
                                "Blitter": {
                                        "busy": "0.000000",
                                        "unit": "%"
                                },
                                "Video": {
                                        "busy": "2.226429",
                                        "unit": "%"
                                },
                                "VideoEnhance": {
                                        "busy": "0.000000",
                                        "unit": "%"
                                },
                                "[unknown]": {
                                        "busy": "0.000000",
                                        "unit": "%"
                                }
                        }
                }
        }
},

https://manpages.debian.org/testing/intel-gpu-tools/intel_gpu_top.1.en.html

@henrygd
Copy link
Owner

henrygd commented Nov 9, 2024

Maybe xpu-smi is a better option. intel_gpu_top isn't made by Intel and doesn't seem to have been updated in many years.

@SonGokussj4
Copy link

How do I do this if I'm running beszel-agent as a Docker service through docker-compose.yaml?

 WARN GPU err="no GPU found - install nvidia-smi or rocm-smi"
# cat docker-compose.yaml
services:
 beszel-agent:
   image: "henrygd/beszel-agent"
   container_name: "beszel-agent"
   restart: unless-stopped
   network_mode: host
   volumes:
     - /var/run/docker.sock:/var/run/docker.sock:ro
   environment:
     PORT: 45876
     KEY: "ssh-ed25519 AAAAC3NzaC1lZ......adrfOpvRdFLD6p"
     GPU: "true"
# nvidia-smi
Sun Nov 10 13:59:38 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 35%   38C    P8    14W / 215W |   1121MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 33%   35C    P8     2W / 215W |      3MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1472954      C   python                           1118MiB |
+-----------------------------------------------------------------------------+

@Morethanevil
Copy link

How do I do this if I'm running beszel-agent as a Docker service through docker-compose.yaml?

 WARN GPU err="no GPU found - install nvidia-smi or rocm-smi"
# cat docker-compose.yaml
services:
 beszel-agent:
   image: "henrygd/beszel-agent"
   container_name: "beszel-agent"
   restart: unless-stopped
   network_mode: host
   volumes:
     - /var/run/docker.sock:/var/run/docker.sock:ro
   environment:
     PORT: 45876
     KEY: "ssh-ed25519 AAAAC3NzaC1lZ......adrfOpvRdFLD6p"
     GPU: "true"
# nvidia-smi
Sun Nov 10 13:59:38 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:09:00.0 Off |                  N/A |
| 35%   38C    P8    14W / 215W |   1121MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 33%   35C    P8     2W / 215W |      3MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1472954      C   python                           1118MiB |
+-----------------------------------------------------------------------------+

You can't at the moment, see above

May it will be possible in future. The image needs to include nvidia-smi or rocm-smi, then it should be possible to mount the gpu like this:

services:
 beszel-agent:
   image: "henrygd/beszel-agent"
   container_name: "beszel-agent"
   restart: unless-stopped
   network_mode: host
   volumes:
     - /var/run/docker.sock:/var/run/docker.sock:ro
   deploy:
     resources:
       reservations:
         devices:
           - driver: nvidia
             count: all
             capabilities:
               - gpu
   environment:
     PORT: 45876
     KEY: "ssh-ed25519 AAAAC3NzaC1lZ......adrfOpvRdFLD6p"
     GPU: "true"

NVIDIA container toolkit should be installed.

@henrygd
Copy link
Owner

henrygd commented Nov 10, 2024

I don't plan to include nvidia-smi or rocm-smi in the Docker image because it will increase the size many times over for a feature that not everyone will use.

But I can add example dockerfiles and compose configs. Maybe even build them automatically as separate images or tags.

@Morethanevil
Copy link

I don't plan to include nvidia-smi or rocm-smi in the Docker image because it will increase the size many times over for a feature that not everyone will use.

But I can add example dockerfiles and compose configs. Maybe even build them automatically as separate images or tags.

As separate images would be the best way 👍🏻
Latest-cuda, latest-rocm etc would be cool

@SonGokussj4
Copy link

@Morethanevil This didn't work for me. I have container toolkit installed, we are using GPU in our other applications deployed by docker-compose and the deploy section is used the same way.
Here, it still prints the warning.

services:
  beszel-agent:
    image: "henrygd/beszel-agent"
    container_name: "beszel-agent"
    restart: unless-stopped
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      PORT: 45876
      KEY: "ssh-ed25519 AAAAC3Nz.....fOpvRdFLD6p"
      GPU: "true"
      # FILESYSTEM: /dev/sda1 # set to the correct filesystem for disk I/O stats
      #FILESYSTEM: data
    deploy:
     resources:
       reservations:
         devices:
           - driver: nvidia
             count: all
             capabilities:
               - gpu
               
# docker compose down; docker compose up -d; docker compose logs -f
[+] Running 1/0
 ✔ Container beszel-agent  Removed                                                                                 0.1s
[+] Running 1/1
 ✔ Container beszel-agent  Started                                                                                 0.0s
beszel-agent  | 2024/11/11 08:29:40 INFO Detected root device name=nvme0n1p1
beszel-agent  | 2024/11/11 08:29:40 INFO Detected network interface name=enp5s0 sent=446766673405 recv=7705963243348
beszel-agent  | 2024/11/11 08:29:40 WARN GPU err="no GPU found - install nvidia-smi or rocm-smi"
beszel-agent  | 2024/11/11 08:29:40 INFO Starting SSH server address=:45876

So it can't be done if it's not build as separated images?

@Morethanevil
Copy link

@Morethanevil This didn't work for me. I have container toolkit installed, we are using GPU in our other applications deployed by docker-compose and the deploy section is used the same way. Here, it still prints the warning.

services:
  beszel-agent:
    image: "henrygd/beszel-agent"
    container_name: "beszel-agent"
    restart: unless-stopped
    network_mode: host
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    environment:
      PORT: 45876
      KEY: "ssh-ed25519 AAAAC3Nz.....fOpvRdFLD6p"
      GPU: "true"
      # FILESYSTEM: /dev/sda1 # set to the correct filesystem for disk I/O stats
      #FILESYSTEM: data
    deploy:
     resources:
       reservations:
         devices:
           - driver: nvidia
             count: all
             capabilities:
               - gpu
               
# docker compose down; docker compose up -d; docker compose logs -f
[+] Running 1/0
 ✔ Container beszel-agent  Removed                                                                                 0.1s
[+] Running 1/1
 ✔ Container beszel-agent  Started                                                                                 0.0s
beszel-agent  | 2024/11/11 08:29:40 INFO Detected root device name=nvme0n1p1
beszel-agent  | 2024/11/11 08:29:40 INFO Detected network interface name=enp5s0 sent=446766673405 recv=7705963243348
beszel-agent  | 2024/11/11 08:29:40 WARN GPU err="no GPU found - install nvidia-smi or rocm-smi"
beszel-agent  | 2024/11/11 08:29:40 INFO Starting SSH server address=:45876

So it can't be done if it's not build as separated images?

As I said before: it is not possible at the moment with docker, since there is no nvidia-smi included. You need to wait for a separate image. I just provided an example how to include the GPU into the compose.yaml (docker-compose.yml)

@SonGokussj4
Copy link

I see, sorry for the misunderstanding. Haven't got a coffee yet. (cheap excuse 😅)

@eximo84
Copy link
Author

eximo84 commented Nov 11, 2024

Maybe xpu-smi is a better option. intel_gpu_top isn't made by Intel and doesn't seem to have been updated in many years.

Interesting. Didn't know this existed. Most guides point you towards jntel_gpu_top.

I'm happy to test but appreciate its tricky to code it you don't have the hardware.

@henrygd
Copy link
Owner

henrygd commented Nov 12, 2024

I think I was wrong about intel_gpu_top not being updated actually.

Does anyone know if it works with newer iGPUs and Arc cards?

If someone with Intel can look into this further and compare intel_gpu_top / xpu-smi that would be very helpful.

We need JSON or CSV output and ideally all the same info as Nvidia / AMD -- GPU name, utilization, VRAM usage, power draw, and temperature.

Maybe next week I'll have some time to try doing it blind with sample output.

@Obamium69
Copy link
Contributor

I'm running a Intel Arc A750 and I can read out the following informations as a JSON:

{
        "period": {
                "duration": 1000.351522,
                "unit": "ms"
        },
        "frequency": {
                "requested": 141.950101,
                "actual": 140.950453,
                "unit": "MHz"
        },
        "interrupts": {
                "count": 187.933937,
                "unit": "irq/s"
        },
        "rc6": {
                "value": 74.878730,
                "unit": "%"
        },
        "imc-bandwidth": {
                "reads": 2758.395964,
                "writes": 1349.216517,
                "unit": "MiB/s"
        },
        "engines": {
                "Render/3D/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Blitter/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Video/0": {
                        "busy": 5.275372,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "Video/1": {
                        "busy": 1.311127,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "VideoEnhance/0": {
                        "busy": 2.374794,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "VideoEnhance/1": {
                        "busy": 0.561587,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                },
                "[unknown]/0": {
                        "busy": 0.000000,
                        "sema": 0.000000,
                        "wait": 0.000000,
                        "unit": "%"
                }
        }
},

intel_gpu_top can't give you the exact name of the GPU. When running intel_gpu_top -L the output looks like this:

card1                    8086:56a1                         pci:vendor=8086,device=56A1,card=0
└─renderD128

@eximo84
Copy link
Author

eximo84 commented Nov 15, 2024

I think I was wrong about intel_gpu_top not being updated actually.

Does anyone know if it works with newer iGPUs and Arc cards?

If someone with Intel can look into this further and compare intel_gpu_top / xpu-smi that would be very helpful.

We need JSON or CSV output and ideally all the same info as Nvidia / AMD -- GPU name, utilization, VRAM usage, power draw, and temperature.

Maybe next week I'll have some time to try doing it blind with sample output.

#262 (comment) - my output here from an Intel ARC A310

@Jamy-L
Copy link

Jamy-L commented Dec 26, 2024

I have been trying to run xpu-smi to provide a sample, without success. Most links on the intel install guides are dead and the install sounds like a massive headache. It must also be noted that few cards appear to be supported, because it targets data centers. intel_gpu_top sounds more reasonable and works with igpus as well. With my 12500 cpu, intel_gpu_top -L gives me:

card0                    Intel Alderlake_s (Gen12)         pci:vendor=8086,device=4690,card=0
└─renderD128

which is totally expected.

@henrygd
Copy link
Owner

henrygd commented Dec 26, 2024

@Jamy-L Thanks, let's strike xpu-smi then. Unfortunately intel_gpu_top isn't a great option for this either. Too bad there's not a simple nvidia-smi / rocm-smi equivalent for Intel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request in progress We've started work on this
Projects
None yet
Development

No branches or pull requests

6 participants