We had a Tesla T4 up and running smoothly on our server PowerEdge R640 with Ubuntu Server 18.04. However, after we upgraded our systems to Ubuntu Server 22.04, NVIDIA is not working. The command nvidia-smi
outputs the following:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver.
After we purge everything, and re-install (multiple times), the server seems to get into boot-loop. The following lines are repeated a lot in kern.log:
Jul 24 05:12:55 nc16 kernel: [327372.079138] nvidia-nvlink: Nvlink Core is being initialized, major device number 509
Jul 24 05:12:55 nc16 kernel: [327372.079145] NVRM: The NVIDIA probe routine was not called for 1 device(s).
Jul 24 05:12:55 nc16 kernel: [327372.089000] NVRM: This can occur when a driver such as:
Jul 24 05:12:55 nc16 kernel: [327372.089000] NVRM: nouveau, rivafb, nvidiafb or rivatv
Jul 24 05:12:55 nc16 kernel: [327372.089000] NVRM: was loaded and obtained ownership of the NVIDIA device(s).
Jul 24 05:12:55 nc16 kernel: [327372.089003] NVRM: Try unloading the conflicting kernel module (and/or
Jul 24 05:12:55 nc16 kernel: [327372.089003] NVRM: reconfigure your kernel without the conflicting
Jul 24 05:12:55 nc16 kernel: [327372.089003] NVRM: driver(s)), then try loading the NVIDIA kernel module
Jul 24 05:12:55 nc16 kernel: [327372.089003] NVRM: again.
Jul 24 05:12:55 nc16 kernel: [327372.089004] NVRM: No NVIDIA devices probed.
However, nouveau is already blacklisted:
File: /etc/modprobe.d/blacklist-nvidia-nouveau.conf
blacklist nouveau
options nouveau modeset=0
File: /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT=“nouveau.blacklist=1 quiet splash rdblaclist=nouveau nomodeset”
Here are also some outputs you might find informative:
Kernel:
$uname -r
5.15.0-76-generic
Graphic Devices:
$lspci | grep NVIDIA
3b:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
$lspci | grep VGA
03:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. Integrated Matrox G200eW3 Graphics Controller (rev 04)
Here is also the nvidia-bug-report
prior to purging everything.
nvidia-bug-report.log (488.4 KB)
Thank you in advance for your help.