Nvidia driver is breaking linux like crazy

Whats wrong with your driver?
i just want a gpu that works. i dont want to spend weeks of trying to install a driver and getting my super expensive gpu to work.
first we need to get the kernel headers.
after upgrading my kernel i couldnt delete old nvidia driver and because your driver needs to get reinstalled everytime the kernel changes. allright after some time edeting houndreds of config files i finally could get the kernel to load the driver. next thing: driver is breaking firefox (needed to turn of gpu acceleration), next thing: audio isnt working anymore. CANT FIND A SINGLE SOLUTION THAT Works! all i read is: nvidia support is giving hints that freezes the users linux. allright all the solutions in the forums (if there even is any) seem to fail. ok what do we do now? lets ask reddit and google. found a page where some guy said: “OH just uninstall the pulseaudio thingy and it will work”. suddenly gdm stops working. ok lets reinstall pulseaudio from the su shell… gdm still doesnt start. trying to figure out why… no system logs pointing to that failure (it just doesnt start) ok lets reinstall gdm from su shell, update the initramfs and restart. OHHHH we finally have gdm running again, but wait: STILL NO SOUND!!! AAAAAAAAAAAAAAAAAAAAAAAA!!!
ok then lets restart the pulseaudio service… still no sound. the problem here is that cant just go back to my backup because it was a raid lvm, so even if i would recreate my entire root partition, i would still need to recreate the lvms manually and recreate my grub config. BUT I DONT WANT TO ROLL BACK MY ENTIRE SYSTEM BECAUSE THE GOD DAMN DRIVER DOESNT WORK!!! whats wrong with your linux-support? dont have some devs that can write a decent driver that is not breaking your system? if i buy a product i want it to work, otherwise i can build my own gpu. if you cant write a deccent driver, why dont you write a manuall that features every aaspect of failure? i mean your current documentation is for rhel 8 and we have already rhel 9.3 now. so is it that your company is relying on the free work of users? i just want my money back, or i throw that stupid gpu out of the window. this is the most terrible experience a user can have. i mean it wouldnt be that bad, if your product wouldnt break my pc and if i wouldnt have to read the entire internet in order to get it to run

why is the nvidia-driver installing an audio-driver? why cant you just give a seperate audio driver that does not break the os? the entire internet is filled with the audio-driver issue. and please dont tell me that is not your fault. if i have to overwrite my entire ssd because of that driver, it wastes my ssd as well and besides that a product isnt functional if it doesnt work. i also dont want to buy a car that is labeled “working” but then it breaks my garage while trying to assemble it before i can even drive it

Since Nvidia overwrote my audio device and I reinstalled both GDM and Pulseaudio, I deleted the Pulse configuration files, restarted the user-only daemon and nothing worked.
after realizing that my sound card is no longer registered by “aplay -l”,
I had to manually reload the Intel module by running
“sudo modprobe snd_hda_intel”. So maybe give users this advice before telling them to reinstall the DKMS module.
Maybe this was caused by an AMD GPU driver that was also partially installed in the kernel, but it still took me all day to figure it out and you should definitely write an article dedicated to NVIDIA sound issues, as this is absolutely unacceptable and I’m really glad I had a second computer lying around

but the issue is still not solved because i have to run modprobe every boot, even though i created the alsa-base.conf in /etc/modprobe.d like this:
“options snd-hda-intel single_cmd=1
options snd-hda-intel probe_mask=1
options snd-hda-intel model=basic”.
and yes i updated the initramfs

i guess i just write some cronjob bash script or something that runs that stupid modprobe command every boot, or login. but still this is a huge waste of time and a pretty bad experience.
but im really glad that i dont need to overwrite my entire root drive just because of a driver, or go chroot / ssh from second pc to debug the entire sys and driver. but its a pretty bad solution nevertheless

or i have to write a script that catches every process that blacklists the device or that overwrites/ deletes the loadable module from the kernel with modprobe. therefore i could use something like grep i guess.
lets figure this out in our next episode :D

What distro are you using? Is there a distribution provided driver package? If so I generally advise to use those over the generic unix (including all the BSDs here) driver provided by Nvidia.

On most Distros it is just ‘my-packagemanager install nvidia-driver’ and after a reboot all works.

i use rocky linux, but i need the driver for cuda and container toolkit. it worked fine before, even though i had some trouble setting it up. the problem here was the DKMS module had trouble setting up the kernel module because of some amd-gpu driver. if they are both installed it can lead to error. i guess this is some sort of hash collision
phind:
“When you install the AMD driver, the snd_hda_intel module is replaced with the AMD driver module. Then when you install the NVIDIA driver, the snd_hda_intel module will be replaced again with the NVIDIA driver’s module. This may cause the audio device to stop working because the snd_hda_intel module used by the sound card is no longer compatible with the hardware.”

but yeah now i know that theres a specific precompiled version for the kernel that gets released within 24h after the kernelrelease, but i dont think that helps with our issue here

[user@localhost ~]$ dmesg | grep snd_hda_intel
[  239.753318] snd_hda_intel 0000:00:1f.3: enabling device (0000 -> 0002)
[  239.753677] snd_hda_intel 0000:02:00.1: enabling device (0000 -> 0002)
[  239.753720] snd_hda_intel 0000:02:00.1: Disabling MSI
[  239.753723] snd_hda_intel 0000:02:00.1: Handle vga_switcheroo audio client

unfortunately we dont know the source, but you can see that the device was activated 2 times. this can be checked in modprobe.d

this is what is suposed to be:

[user@localhost ~]$ aplay -l
**** Liste der Hardware-Geräte (PLAYBACK) ****
Karte 0: PCH [HDA Intel PCH], Gerät 0: Generic Analog [Generic Analog]
  Sub-Geräte: 0/1
  Sub-Gerät #0: subdevice #0
Karte 1: NVidia [HDA NVidia], Gerät 3: HDMI 0 [HDMI 0]
  Sub-Geräte: 1/1
  Sub-Gerät #0: subdevice #0
Karte 1: NVidia [HDA NVidia], Gerät 7: HDMI 1 [HDMI 1]
  Sub-Geräte: 1/1
  Sub-Gerät #0: subdevice #0
Karte 1: NVidia [HDA NVidia], Gerät 8: HDMI 2 [HDMI 2]
  Sub-Geräte: 1/1
  Sub-Gerät #0: subdevice #0
Karte 1: NVidia [HDA NVidia], Gerät 9: HDMI 3 [HDMI 3]
  Sub-Geräte: 1/1
  Sub-Gerät #0: subdevice #0

You’re having a misconception about how things work, so some info to clear things up: the nvidia driver not installing any sound driver, the nvidia gpu’s audio device is also handled by the linux kernel’s snd_hda_intel driver. This is why you see two enabled devices

snd_hda_intel 0000:00:1f.3: enabling device

is the mainboard’s integrated sound.

snd_hda_intel 0000:02:00.1: enabling device

is likely the nvidia device if it has an audio device.

When you say “install the AMD driver”, what do you mean by that? The normal amdgpu driver comes with the linux kernel. There’s also the installable AMD GPU Pro driver from AMD but this shouldn’t be used, this only works on AMD-only systems since it replaces system libraries with amd exclusive ones. Though I wouldn’t know why this should meddle with the audio driver, should be handled by snd_hda_intel as well.