And its using a RTX 2060 SUPER GPU card. Before updating I saw that the inbuit nouveau drivers were being used. There was a minor kernel upgrade from auto-updates pending. So I upgraded it before proceeding with nvidia driver install.
The kernel version now is: 5.13.0-40. After the install of the nvidia-driver-510 (recommended one) on reboot there is a blank screen. I upgraded the kernel to the edge using sudo add-apt-repository ppa:canonical-kernel-team/proposed -y
But its the same. Also its the same even if I switch to 470 drivers
In BIOS the Graphics selector options are set to Auto. There is also option for 1i) Intel Graphics ii) PEG (?) iii) PCi
No monitor is connected to the DVI (to which the iGPU is connected to). Only the NVIDIA GPU is connected to the monitor using DisplayPort.
If I try to boot in recovery mode and run the nvidia-smi, there is a page fault
As I understand unlike a laptop (igpu + dgpu) there is no monitor for the iGPU on initial boot before switching. Would this be an hint for debugging ?
Unfortunately the proxy rules at where I work is not allowing me to upload the bug-report. But one thing I see is that there is that there is
BUG: Unable to handle page fault…
After that system freeze. Which is the same when I run nvidia-smi tool…
Also I tried Ubuntu 18.04 with nvidia-driver-470 which was recommended, but no luck there too. In this case the kernel tries to start the graphics and then restarts and starts again stuck in an loop.
So now looking at the bug reports myself, I see this continuous list of messages like this:
May 26 11:23:18 ace kernel: [ 948.065785] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
May 26 11:23:18 ace kernel: [ 948.065790] NVRM: The NVIDIA probe routine was not called for 1 device(s).
May 26 11:23:18 ace kernel: [ 948.066538] NVRM: This can occur when a driver such as:
May 26 11:23:18 ace kernel: [ 948.066538] NVRM: nouveau, rivafb, nvidiafb or rivatv
Is the suggested solution to blacklist the nouveau drivers ?
Now the system boots but NVIDIA drivers are not loaded (nvidia-smi gives an error) and the display is stuck at 640x480 resolution.
Again I ran the nvidia-bug-report as root and I will attach the report later.
It’s a bit messy, the bug was first introduced with driver 470 last year, then fixed in 470.54 and later broke again with 470.87. I don’t know if this has been fixed in the just released 515.48.07.
The recommended way to install is to use the driver from distro repo and then just install the cuda-toolkit, e.g.
sudo apt install cuda-toolkit-11-4
or use the cuda runfile installer and skip driver install when asked.
No. You already have a driver installed, to leave that intact on cuda install, install only the toolkit. Done by running
sudo apt install cuda-toolkit-11-4
instead of apt install cuda
or skipping the driver install when using the cuda runfile.