Hello,
I am installing Nvidia drivers for a number of compute nodes, some of which have different video cards. After a few successful installs, I am on the node with eight V100 cards. After installation (without error), it will not load the kernel module, and all Nvidia commands fail. Not sure where to go from here (without it reporting errors).
The kernel driver will not load:
modprobe nvidia
modprobe: ERROR: could not insert ‘nvidia’: No such device
And therefore the Nvidia-persistence will also not run:
nvidia-persistenced
nvidia-persistenced failed to initialize. Check syslog for more details.
Syslog shows no errors.
This is on RHEL 9.4, official install (e.g. not alma or rocky).
I ran Nvidia-bug-report.sh, which is attached.
nvidia-bug-report.log.gz (225.8 KB)
Just an update: Updated to the very latest driver, 560.35.03, same issue.
Screenshot of drac showing error on virtual console.