Dcgm-exporter not working

I was not sure which category the dcgm topic belongs to so i post here.

Tried grafana integration with dcgm stumbled on first even step.

Right now:

  • cuda driver installed ok
  • dcgm installed ok.

But according to dgcm exporter, docker is not running:

–gpus all is passed so device should be visible in the container

I am not sure if this is due to O/S. Base OS is centos 9 stream (rpm based) whereas docker container is ubuntu. But in the past, this combo really did not cause issues.

./setup-dcgm-exporter.sh
Unable to find image ‘nvcr.io/nvidia/k8s/dcgm-exporter:2.1.4-2.3.1-ubuntu20.04’ locally
2.1.4-2.3.1-ubuntu20.04: Pulling from nvidia/k8s/dcgm-exporter
83ee3a23efb7: Pull complete
d46b0a86b351: Pull complete
843cb791a04b: Pull complete
db98fc6f11f0: Pull complete
e0ce9ffc47b8: Pull complete
f611acd52c6c: Pull complete
5a183bd84d53: Pull complete
314ae387b68f: Pull complete
b11f8932ea2b: Pull complete
ed5384e33a63: Pull complete
a2eae9c1938e: Pull complete
Digest: sha256:fd3c03b1bb529153e27fee8761a08dc918adcd5f097d50cae8d49e783211b55d
Status: Downloaded newer image for nvcr.io/nvidia/k8s/dcgm-exporter:2.1.4-2.3.1-ubuntu20.04
f95aad93d61f5e7a18ebc151878086f38a577f87244245278f7e5a0d384a2a8d
docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]]

Run ‘docker run --help’ for more information
[nonroot@localhost cuda]$ popdnvidi^C
[nonroot@localhost cuda]$ ncid^C
[nonroot@localhost cuda]$ nvidia-smi
Tue Dec 16 00:09:10 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
±----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2070 Off | 00000000:01:00.0 Off | N/A |
| 41% 36C P8 10W / 185W | 94MiB / 8192MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA GeForce RTX 2070 … Off | 00000000:04:00.0 Off | N/A |
| 41% 29C P8 4W / 215W | 2MiB / 8192MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+

±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 46003 G /usr/libexec/Xorg 77MiB |
| 0 N/A N/A 49989 G /usr/bin/gnome-shell 6MiB |
±----------------------------------------------------------------------------------------+
[nonroot@localhost cuda]$ cat setup-dcgm-exporter.sh
SUDO=“sudo”

DCGM_EXPORTER_VERSION=2.1.4-2.3.1 &&
$SUDO docker run -d --rm
–gpus all
–net host
–cap-add SYS_ADMIN

nvcr.io/nvidia/k8s/dcgm-exporter:${DCGM_EXPORTER_VERSION}-ubuntu20.04

-f /etc/dcgm-exporter/dcp-metrics-included.csv

[nonroot@localhost cuda]$ dcgmi discovery -l
2 GPUs found.
±-------±---------------------------------------------------------------------+
| GPU ID | Device Information |
±-------±---------------------------------------------------------------------+
| 0 | Name: NVIDIA GeForce RTX 2070 |
| | PCI Bus ID: 00000000:01:00.0 |
| | Device UUID: GPU-f004532e-b408-c5d0-a5c3-005db71c4b6e |
±-------±---------------------------------------------------------------------+
| 1 | Name: NVIDIA GeForce RTX 2070 SUPER |
| | PCI Bus ID: 00000000:04:00.0 |
| | Device UUID: GPU-996186c8-2bca-5806-6493-746bc8afd579 |
±-------±---------------------------------------------------------------------+
0 NvSwitches found.
±----------+
| Switch ID |
±----------+
±----------+
0 ConnectX found.
±---------+
| ConnectX |
±---------+
±---------+
0 CPUs found.
±-------±---------------------------------------------------------------------+
| CPU ID | Device Information |
±-------±---------------------------------------------------------------------+
±-------±---------------------------------------------------------------------+
[nonroot@localhost cuda]$

Hi there @g900nvda,

The project has its own “Issues” section on Github. So you will have a better chance of getting a reply there than here on the forums.

Referring to the Readme it might be worth checking the NVIDIA GPU Operator instead of the exporter directly, and see if that helps.

Thanks!

1 Like

thank you, I will post it there.