Hi
We got a virtual Nvidia Tesla T4 GPU mapped from Nutanix.
The card can be seen in the RedHat system:
# lspci | grep -i nvidia
# 03:00.0 VGA compatible controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
The drivers have been installed
dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r)
dnf -y install cuda
dnf install nvidia-gds
When I start the NVIDIA driver, I get the following error.
journalctl -xe
Unit nvidia-persistenced.service has finished shutting down.
systemd[1]: nvidia-persistenced.service: Start request repeated too quickly.
systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
or in /var/log/messages:
nvidia-persistenced[9162]: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 0 has read and write permissions for those files.
nvidia-persistenced[9162]: PID file unlocked.
nvidia-persistenced[9161]: nvidia-persistenced failed to initialize. Check syslog for more details.
nvidia-persistenced[9162]: PID file closed.
systemd[1]: nvidia-persistenced.service: Control process exited, code=exited status=1
nvidia-persistenced[9162]: Shutdown (9162)
systemd[1]: nvidia-persistenced.service: Failed with result 'exit-code'.
systemd[1]: Failed to start NVIDIA Persistence Daemon.
systemd[1]: nvidia-persistenced.service: Service RestartSec=100ms expired, scheduling restart.
systemd[1]: nvidia-persistenced.service: Scheduled restart job, restart counter is at 3.
systemd[1]: Stopped NVIDIA Persistence Daemon.
systemd[1]: Starting NVIDIA Persistence Daemon...
Unfortunately I can’t do anything with it on