*Bug* 470.42.01-1 dGPU can not be initialized

Hello community,

I got an update today to Linux 5.13 and nvidia 470 (formerly 465) and that seemed to have broken my dGPU. I am on a HP Spectre x360 15" and have been running a GTX 1650 Max-Q alongside an intel 9750H for a year without problems on this machine. Hybrid GPU worked perfectly, just had to start the nvidia dGPU with extra environment commands for some apps.

After the driver update it purely runs on the intel GPU and kernel messages give me:

kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x26:0x56:1257)
kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

I am dual booting win10 and checked if the GPU got broken, but in win10 it is working flawlessly. So its no hardware issue.

As grub boot params I have only set a resume UUID for hibernation on the laptop, nothing more, so its a clean launch. I tried with Linux 5.13, 5.12 and 5.10 on all the same error occurs.

I always left the driver to be installed by Manjaro Linux MHWD Manager and never attempeted something manually.

Do you have an idea what could be wrong? I tried downgrading to nvidia-utils 465.31 but it warns me that the actual version of nvidia-utils is a dependancy of linux513-nvidia (respectively 512 and 510) so I didnt continue on that path.

What am I doing wrong? Thanks for your help in advance.

Greetings,
~ent

1 Like

So I got a bit more time in the evening and downgraded the driver and my dGPU is back up and running. So its most probably a software bug in the new driver. (This is for Linux 5.10, I reverted everything back to before the update, tho the virtualbox and acpi_call modules probably werent necessary)

2 Likes

Same here with 470.57.02. Happened after a PopOS update. Intel graphics only and non functioning HDMI. Same computer (4k OLED version). First I thought the NVIDIA chip was broken but put back the original Windows 10 SSD and everything was running fine.

Unistalled PopOS NVIDIA drivers:
sudo apt remove nvidia*

Installed version 465.31, downloaded from nvidia.com:
chmod +x NVIDIA-Linux-x86_64-465.31.run
sudo ./NVIDIA-Linux-x86_64-465.31.run

And after a reboot PopOS is running perfectly with NVIDIA GTX 1650 Max-Q graphics again.

Same here,
Problem encountered on Arch Linux; temporarily fixed by rolling back to version 465.
Does anyone happen to know the root of the issue?

On a Lenovo Thinkpad Yoga X1 gen 2, with an external RTX 2070 connected via Thunderbolt, 4k screen connected to the card’s HDMI output,
after the 470.57.02 upgrade:
The graphics card crashes, blanking the HDMI output, after a period of time that varies from 30 minutes to a few seconds.
After this a reboot is not enough, I have to cut the power to the card for at least 5 seconds so that it will function again.
Downgrading to 465.31 made everything run reliably.

Same thing here!
I’m using a Dell E6440 with a GTX 1050Ti connected via an EXP GDC. Downgrading to 465 fixed it for me.
I have the same RmInitAdapter failed error in dmesg.
nvidia-bug-report.log.gz (110.9 KB)

Please consider fixing this, this makes distros that include the latest drivers in their ISOs like Pop_OS be a pain in the butt to figure out how to fix for new users.
It seems that this bug has already been mentioned in a slightly earlier thread.

I am also in the same boat with an hp spectre eb000 (10th gen intel) running Pop!_OS 21.0.4. When using the install script, did you get this error?

WARNING: Unable to determine the path to install the libglvnd EGL vendor library config files. Check that you have pkg-config and the libglvnd development libraries installed, or specify a path with --glvnd-egl-config-path.

Afterwards, it said that the installation failed, but nvidia-smi worked again after a reboot.

I tried installing again after removing the drivers through the pop os recommended way with the commands

sudo apt purge ~nnvidia
sudo apt autoremove
sudo apt clean

, but after installing with the nvidia.run file it wasn’t detected- perhaps because I accidentally said no to automatic xorg configuration?

I should just use your method since it seems to be working for you, though. Does your solution still work with system76-power, particularly its hybrid mode? Thanks for your insight.

On my HP Spectre the xorg.conf file is not necessary.
(the Thinkpad with the external thunderbolt card needs the xorg.conf file because of a non-stardard PCI address that has to be read with the lspci command and manually specified in the xorg.conf file).
Check that the opensource Nvidia driver called “noveau” is not loaded, there are instructions on how to remove it online.
I did not test the system76-power settings tool, I didn’t want to touch the Nvidia settings after I got it working.
A path error during the setup I got when I chose to have the 32 bits support installed, but the setup did not fail and the card worked.

As a heads up, this particular RmInitAdapter error is being tracked in internal bug number 3350093 and it should be fixed in a future release.

1 Like

Thanks that is good to now.
Since I had this error I dug a bit into manjaro nvidia driver management and uninstalled the standard nvidia drivers, since they are dependant on the actual kernel version on manjaro. Got me the nvidia-dkms package instead from the AUR and the package downgrade, so I have more freedom on the driver version choice now, independent from updating the kernel. With that I downgraded to the 465 driver again.

Since it does affect some spectres, maybe its a hardware related issue. The RTD3 power management never really worked on the spectre aswell. It always leaves the GPU on drawing 2-4 Watts, when it should be powered off. On nvidia-smi I always got xorg running, when there should be no processes at all on full suspend.

How are you checking that? Using nvidia-smi? Please note that running nvidia-smi will temporarily cause the GPU to wake up and return to full power. Please check /proc/driver/nvidia/gpus/<busid>/power and /sys/bus/pci/devices/<busid>/power/runtime_status, where <busid> is the PCI bus identifier for your GPU (check lspci to find the appropriate value for your system).

That’s expected; when Xorg is running and attached to the GPU, it will always show up in the process list. If power management is enabled on a supported platform, then the driver will be able to power off part or all of the GPU even with Xorg running.

Yes I know, nvidia-smi does impair the suspend function.
It has been some time, I have been tinkering with this, but I used runtime_status and power aswell. In power it says:

Runtime D3 status: Enabled (fine-grained)
Video Memory: Active
GPU Hardware Support:
Video Memory Self Refresh: Supported
Video Memory Off: Supported

So I should be good to go. Yet runtime_status always reports “active” no matter what. Interestingly “runtime_suspended_time” next to “runtime_status” always gives a small number that is different from boot to boot. This time it is 1235. So it looks like its working for a brief moment.

My suggestion was that something is running, using the dGPU I dont know of, but I have found nothing so far. Running nvtop, I get this information about the only process running on the dGPU:

root 0 Graphic 0 % (GPU) 4 MiB (GPU) 5-10% (CPU) 154 MiB (MEM) Command: /usr/lib/Xorg -dpi 192 -background none -seat seat0 vt1 -auth /var/run/sddm/{…} -noreset -displayfd 17

I am on a standard Manjaro Plasma install.
If you have any more ideas or know a thread for this other problem, I’ld gladly look into it again. Thanks in advance. :)

P.S… Driver version 470.63.01 does work for me again !! (except for the RTD3 suspend, there the issue persists)

Spectre X360, PopOS, the HDMI output stopped working again. Somehow the 470 driver got reinstalled. It is the 3rd time it happens and I am sure I did not do it this time. I followed my own instructions on how to get back to 465, but the Noveau driver, that is supposed to be disabled, managed to creep in again. I had to re-google on how to disable it. I am posting my updated complete popos-roll-back-to-465-instructions so that I have it easier next time:

#remove the current NVIDIA driver
sudo apt remove nvidia*
sudo reboot now

#disable the noveau driver
sudo bash -c “echo blacklist nouveau > /etc/modprobe.d/blacklist-nvidia-nouveau.conf”
sudo bash -c “echo options nouveau modeset=0 >> /etc/modprobe.d/blacklist-nvidia-nouveau.conf”
sudo update-initramfs -u
sudo reboot now

#install NVIDIA 465 driver
#downloaded from nvidia.com
chmod +x NVIDIA-Linux-x86_64-465.31.run
sudo ./NVIDIA-Linux-x86_64-465.31.run
sudo reboot now

A new production version 470.74 was out on 2021.9.20
Has anyone installed it and checked if the issues are fixed?

The 470.86 driver is working for me on POP!_OS on an HP spectre, no workaround needed. You’re probably good to stop using the manual method of installing drivers now.

RTD3 suspend does work perfectly aswell now. On office work the Intel goes down to 40 °C and battery life increasd to a good 8 -10 hours.

It was, as with most things, the users fault. Found another process that was polling the nvidia-smi command. :/ Sorry !