I was not sure which category the dcgm topic belongs to so i post here.
Tried grafana integration with dcgm stumbled on first even step.
Right now:
- cuda driver installed ok
- dcgm installed ok.
But according to dgcm exporter, docker is not running:
–gpus all is passed so device should be visible in the container
I am not sure if this is due to O/S. Base OS is centos 9 stream (rpm based) whereas docker container is ubuntu. But in the past, this combo really did not cause issues.
./setup-dcgm-exporter.sh
Unable to find image ‘nvcr.io/nvidia/k8s/dcgm-exporter:2.1.4-2.3.1-ubuntu20.04’ locally
2.1.4-2.3.1-ubuntu20.04: Pulling from nvidia/k8s/dcgm-exporter
83ee3a23efb7: Pull complete
d46b0a86b351: Pull complete
843cb791a04b: Pull complete
db98fc6f11f0: Pull complete
e0ce9ffc47b8: Pull complete
f611acd52c6c: Pull complete
5a183bd84d53: Pull complete
314ae387b68f: Pull complete
b11f8932ea2b: Pull complete
ed5384e33a63: Pull complete
a2eae9c1938e: Pull complete
Digest: sha256:fd3c03b1bb529153e27fee8761a08dc918adcd5f097d50cae8d49e783211b55d
Status: Downloaded newer image for nvcr.io/nvidia/k8s/dcgm-exporter:2.1.4-2.3.1-ubuntu20.04
f95aad93d61f5e7a18ebc151878086f38a577f87244245278f7e5a0d384a2a8d
docker: Error response from daemon: could not select device driver “” with capabilities: [[gpu]]
Run ‘docker run --help’ for more information
[nonroot@localhost cuda]$ popdnvidi^C
[nonroot@localhost cuda]$ ncid^C
[nonroot@localhost cuda]$ nvidia-smi
Tue Dec 16 00:09:10 2025
±----------------------------------------------------------------------------------------+
| NVIDIA-SMI 590.44.01 Driver Version: 590.44.01 CUDA Version: 13.1 |
±----------------------------------------±-----------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 2070 Off | 00000000:01:00.0 Off | N/A |
| 41% 36C P8 10W / 185W | 94MiB / 8192MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
| 1 NVIDIA GeForce RTX 2070 … Off | 00000000:04:00.0 Off | N/A |
| 41% 29C P8 4W / 215W | 2MiB / 8192MiB | 0% Default |
| | | N/A |
±----------------------------------------±-----------------------±---------------------+
±----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 46003 G /usr/libexec/Xorg 77MiB |
| 0 N/A N/A 49989 G /usr/bin/gnome-shell 6MiB |
±----------------------------------------------------------------------------------------+
[nonroot@localhost cuda]$ cat setup-dcgm-exporter.sh
SUDO=“sudo”
DCGM_EXPORTER_VERSION=2.1.4-2.3.1 &&
$SUDO docker run -d --rm
–gpus all
–net host
–cap-add SYS_ADMIN
nvcr.io/nvidia/k8s/dcgm-exporter:${DCGM_EXPORTER_VERSION}-ubuntu20.04
-f /etc/dcgm-exporter/dcp-metrics-included.csv
[nonroot@localhost cuda]$ dcgmi discovery -l
2 GPUs found.
±-------±---------------------------------------------------------------------+
| GPU ID | Device Information |
±-------±---------------------------------------------------------------------+
| 0 | Name: NVIDIA GeForce RTX 2070 |
| | PCI Bus ID: 00000000:01:00.0 |
| | Device UUID: GPU-f004532e-b408-c5d0-a5c3-005db71c4b6e |
±-------±---------------------------------------------------------------------+
| 1 | Name: NVIDIA GeForce RTX 2070 SUPER |
| | PCI Bus ID: 00000000:04:00.0 |
| | Device UUID: GPU-996186c8-2bca-5806-6493-746bc8afd579 |
±-------±---------------------------------------------------------------------+
0 NvSwitches found.
±----------+
| Switch ID |
±----------+
±----------+
0 ConnectX found.
±---------+
| ConnectX |
±---------+
±---------+
0 CPUs found.
±-------±---------------------------------------------------------------------+
| CPU ID | Device Information |
±-------±---------------------------------------------------------------------+
±-------±---------------------------------------------------------------------+
[nonroot@localhost cuda]$