DCGM installation OK, running? some issues

So I ran dgm setup first time and installation seems ok but when running using systemctl, got following:

[root@localhost cuda]# sudo systemctl status nvidia-dgcm
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
root@localhost cuda]# sudo systemctl start nvidia-dgcm
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
[root@localhost cuda]#

here is my setup script using instruction from dcgm website:

cat dcgm.sh
set -x
CUDA_VERSION=$(nvidia-smi | sed -E -n 's/.*CUDA Version: ([0-9]+)[.].*/\1/p')
# Installing the recommended packages provides additional DCGM functionality
# which is not present in the DCGM opensource product. To opt out of these
# packages and the associated functionality, replace --setopt=install_weak_deps=True with --setopt=install_weak_deps=False.

sudo dnf install --assumeyes \
                   --setopt=install_weak_deps=True \
                   datacenter-gpu-manager-4-cuda${CUDA_VERSION}


sudo dnf install --assumeyes datacenter-gpu-manager-4-devel
sudo systemctl --now enable nvidia-dcgm
dcgmi discovery -l

I Have two older RTX2070 models and according to support, it should be supported:
https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/getting-started.html

Update: I did from docker env, not sure if that is case, trying now with from baremetal with conda env.

works now from outside docker env!!
Complete!
+ sudo systemctl --now enable nvidia-dcgm
Created symlink /etc/systemd/system/dcgm.service → /usr/lib/systemd/system/nvidia-dcgm.service.
Created symlink /etc/systemd/system/multi-user.target.wants/nvidia-dcgm.service → /usr/lib/systemd/system/nvidia-dcgm.service.
+ dcgmi discovery -l
2 GPUs found.
+--------+----------------------------------------------------------------------+
| GPU ID | Device Information                                                   |
+--------+----------------------------------------------------------------------+
| 0      | Name: NVIDIA GeForce RTX 2070                                        |
|        | PCI Bus ID: 00000000:01:00.0                                         |
|        | Device UUID: GPU-f004532e-b408-c5d0-a5c3-005db71c4b6e                |
+--------+----------------------------------------------------------------------+
| 1      | Name: NVIDIA GeForce RTX 2070 SUPER                                  |
|        | PCI Bus ID: 00000000:04:00.0                                         |
|        | Device UUID: GPU-996186c8-2bca-5806-6493-746bc8afd579                |
+--------+----------------------------------------------------------------------+
0 NvSwitches found.
+-----------+
| Switch ID |
+-----------+
+-----------+
0 ConnectX found.
+----------+
| ConnectX |
+----------+
+----------+
0 CPUs found.
+--------+----------------------------------------------------------------------+
| CPU ID | Device Information                                                   |
+--------+----------------------------------------------------------------------+
+--------+----------------------------------------------------------------------+
(base) [nonroot@localhost cuda]$

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.