So I ran dgm setup first time and installation seems ok but when running using systemctl, got following:
[root@localhost cuda]# sudo systemctl status nvidia-dgcm
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
root@localhost cuda]# sudo systemctl start nvidia-dgcm
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down
[root@localhost cuda]#
here is my setup script using instruction from dcgm website:
cat dcgm.sh
set -x
CUDA_VERSION=$(nvidia-smi | sed -E -n 's/.*CUDA Version: ([0-9]+)[.].*/\1/p')
# Installing the recommended packages provides additional DCGM functionality
# which is not present in the DCGM opensource product. To opt out of these
# packages and the associated functionality, replace --setopt=install_weak_deps=True with --setopt=install_weak_deps=False.
sudo dnf install --assumeyes \
--setopt=install_weak_deps=True \
datacenter-gpu-manager-4-cuda${CUDA_VERSION}
sudo dnf install --assumeyes datacenter-gpu-manager-4-devel
sudo systemctl --now enable nvidia-dcgm
dcgmi discovery -l