I know this is a common problem with NVIDIA graphics drivers, but I can’t see how there can be a systematic approach to find the cause.
I have this system: https://www.neousys-tech.com/en/product/application/in-vehicle-computing/nuvo-6108gc-250w-gpu-in-vehicle-computing-with-ignition-control
And its using a RTX 2060 SUPER GPU card. Before updating I saw that the inbuit nouveau drivers were being used. There was a minor kernel upgrade from auto-updates pending. So I upgraded it before proceeding with nvidia driver install.
The kernel version now is: 5.13.0-40. After the install of the nvidia-driver-510 (recommended one) on reboot there is a blank screen. I upgraded the kernel to the edge using
sudo add-apt-repository ppa:canonical-kernel-team/proposed -y
But its the same. Also its the same even if I switch to 470 drivers
- In BIOS the Graphics selector options are set to Auto. There is also option for 1i) Intel Graphics ii) PEG (?) iii) PCi
No monitor is connected to the DVI (to which the iGPU is connected to). Only the NVIDIA GPU is connected to the monitor using DisplayPort.
- If I try to boot in recovery mode and run the nvidia-smi, there is a page fault
As I understand unlike a laptop (igpu + dgpu) there is no monitor for the iGPU on initial boot before switching. Would this be an hint for debugging ?
Can something like this be a solution: Possible SOLUTION ! for Black Screen UBUNTU and latest 460 driver! ?
Or should I switch to even older drivers say 460 or 450 ?
Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.
Unfortunately the proxy rules at where I work is not allowing me to upload the bug-report. But one thing I see is that there is that there is
BUG: Unable to handle page fault…
After that system freeze. Which is the same when I run nvidia-smi tool…
Finally I am able to upload. Here is the bug report. After that system freeze with the output shown below
nvidia-bug-report.log.gz (129.5 KB)
Also I tried Ubuntu 18.04 with nvidia-driver-470 which was recommended, but no luck there too. In this case the kernel tries to start the graphics and then restarts and starts again stuck in an loop.
Please help me…
So now looking at the bug reports myself, I see this continuous list of messages like this:
May 26 11:23:18 ace kernel: [ 948.065785] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
May 26 11:23:18 ace kernel: [ 948.065790] NVRM: The NVIDIA probe routine was not called for 1 device(s).
May 26 11:23:18 ace kernel: [ 948.066538] NVRM: This can occur when a driver such as:
May 26 11:23:18 ace kernel: [ 948.066538] NVRM: nouveau, rivafb, nvidiafb or rivatv
Is the suggested solution to blacklist the nouveau drivers ?
I have also tried to blacklist the nouveau drivers as given here:
Now the system boots but NVIDIA drivers are not loaded (nvidia-smi gives an error) and the display is stuck at 640x480 resolution.
Again I ran the nvidia-bug-report as root and I will attach the report later.
When I look into this bug report too the nouveau drivers seem still loaded. How is this possible ?
Is there something else I have to do like in GRUB: The NVIDIA probe routine was not called for 1 device(s) · Issue #1 · probonopd/system · GitHub ?
From the log:
NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0x56:1463)
This looks like you ran into a known driver bug. Please check if you can get it to work with a runfile driver 470.74
Ok. Thank you. Should I use Ubuntu 18.04 or 20.04 ?
Doesn’t matter. You just need to make sure you uninstalled the packaged driver beforehand and blacklisted nouveau.
Wow, its working now! I am running it on Ubuntu 18.04.
Now the question is how can I install compatible CUDA toolkit without disturbing this driver installed. I guess this is the only driver version with the bug-fix ?
Generally what I see is that if I install a CUDA toolkit from here:
I see that it has its own nvidia driver version…-
And if I install from ubuntu packages I will not get the latest CUDA…
It’s a bit messy, the bug was first introduced with driver 470 last year, then fixed in 470.54 and later broke again with 470.87. I don’t know if this has been fixed in the just released 515.48.07.
The recommended way to install is to use the driver from distro repo and then just install the cuda-toolkit, e.g.
sudo apt install cuda-toolkit-11-4
or use the cuda runfile installer and skip driver install when asked.
Do you mean I have to look for the drivers included in the CUDA toolkit one by one ?
For example CUDA 11.4 is built on
the 470.42 not the one I am looking for…
“or use the cuda runfile installer and skip driver install when asked.”
Can I do this with the run file above ?
No. You already have a driver installed, to leave that intact on cuda install, install only the toolkit. Done by running
sudo apt install cuda-toolkit-11-4
instead of apt install cuda
or skipping the driver install when using the cuda runfile.
Yes, when started, the installer asks whther you want to install the driver. Just say no.
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.