Nvidia 535.86.05 unable to install on Ubuntu 23.04

Ubuntu 23.04
6.4 kernel
gtx 1060 mobile

I had installed 535 driver and just played a videogame, got an update and clicked update. My system crashed, and I had to restart

After booting tried sudo apt --fix-broken install

2 packages wasn’t installed properly:
nvidia-dkms-535
nvidia-driver-535

post-install returns error 3
nvidia-bug-report.log (1001.7 KB)

switched on previous version by installing 535-server driver, everything works with it

I’ve noticed with the last several versions (535.54-.98) that the desktop will black out and not return during the install/setup using the Ubuntu Graphics Drivers PPA. That’s definitely a problem, but immediately rebooting right after the screen blanks is causing users the bigger problems. They’re restarting while the setup is still running, hence the need to run --fix-broken after reboot.

If you let it sit for a reasonable amount of time (5-10min) after the screen blanks, and then reboot gently with alt+sysreq+REISUB, the system will boot up with the new driver installed with no issues.

Just putting this info out there for users in case nVidia doesn’t fix whatever triggers the GPU resets during install in the next release.

1 Like

I did exactly the same. I was waiting around 20 min before rebooting. Screen freeze during installation and audio has stopped, after 12-15 min audio continued but the screen didn’t unfreeze. After reboot there was no card in nvidia settings, and installation through ubuntu drivers app still not work, I always got error 3 at post-install scenario

This happens on both my laptop (Ubuntu 22.04) and my workstation (Ubuntu 23.04). Both on kernel 6.2.x and both using graphics drivers PPA.

On my laptop I can sometimes reach a VT once the black screen hits. Xorg is using 100% CPU at the time. The nvidia-suspend.service has usually logged a “write error: Input/output error” during the update. Resuming from suspend has not worked the entire 535 series if something was using the GPU when suspending, so it could be related.

My workstation is completely unresponsive when updating the driver. A hard reboot makes the BIOS stop on an odd high CPU temp error. Another reboot seems to make things right, except for a bodged apt update, requiring a manual dpkg configure.

I’ve noticed with the last several versions (535.54-.98) that the desktop will black out and not return during the install/setup using the Ubuntu Graphics Drivers PPA.

I have noticed the same and it’s especially problematic when using automatic updates. You’re doing whatever task you’re doing, and suddenly a black void on screen.

I’ve seen the same and I created a thread over on the Phoronix forums where I put in my findings.

The way I now update the system is:

  • Boot Ubuntu, adding “module_blacklist=nvidia” to the end of the kernel line.
  • Update driver.
  • Reboot without the blacklist.

It’s a hassle to do it, but at least it works. On my machine, 5950x+RTX3090, after losing screen, I can’t reboot it any sane way. If I ssh in and do the installation, I can wait for it to complete, but when I tell it to reboot, get booted from the ssh (for obvious reasons), but it doesn’t reboot (or poweroff if I tell it to do that).

This doesn’t only happen with the PPA driver, the non-PPA 535.86 driver (as opposed to 535.104 from PPA) has the same problem.

If you install kubuntu instead of ubuntu (installer needs to use safe graphics mode), then the 525 driver is most often installed. If you then install the 535 driver, regardless of if you use
sudo ubuntu-drivers install nvidia:535
or
sudo apt install nvidia-driver-535
it still loses the screen.

My laptop has the same problem, so it’s not a desktop thing only. It has a 3070 Mobile (I think they were called Mobile then?).

// Stefan

I’ve found out that I was able to update to the newer nvidia-535 from ubuntu updates only after uninstalling all unofficial kernels while boot into standard signed ubuntu kernel
Any other kernels >6.2 will not work with this driver anymore and Idk why
So I had to switch back to official Ubuntu 6.2 kernel

I’m running the official kernel and it doesn’t work for me, unless I do what I described above. I’ve done about 30 reinstalls over the past few days just to try to narrow it down, both ubuntu and kubuntu, first with 6.2.0-27 and then 6.2.0-31.

The only reliable way for me is to do what I wrote above. I’ve tried both with and without applying any updates, i.e. install system, boot into system, do things, or install system, boot into system, update system, reboot, do things. With things being update driver to 535.104.

Actually, I’m wondering if this might not be part of the problem:

This is when I install it over SSH, so that I actually see what’s going on:

nvidia-persistanced.service is a disabled or a static unit, not starting it. (twice)
Setting up cpp-12 (12.3.0-1ubuntu1~23.04) ...
Setting up nvidia-kernel-common-535 (535.104.05-0ubuntu0.23.04.1) ...
Installing new version of config file /etc/modprobe.d/nvidia-graphics-drivers-kms.conf ...
update-initramfs: deferring update (trigger activated) <== This is where I believe it blanked out.
Could not execute systemctl: at /usr/bin/deb-systemd-invoke line 145. <== Look at this line.
Setting up libnvidia-decode-535:amd64 (535.104.05-0ubuntu0.23.04.1) ... (and same for i386).

I believe that’s this line here:
system('systemctl', '--quiet', @instance_args, $action, @start_units) == 0 or die("Could not execute systemctl: $!");

My perl is rusty but iirc $! is system error code, so it’s supposed to say what the error is, or reason why it couldn’t.

I haven’t put in debug info to see what command it is that’s failing.

Update:

I can’t post more replies in this thread, so I have copied it here instead:

@cam8 It did actually in a way. Thanx!

So I found what’s going on!

I added debug info just before that line I quoted above, in deb-systemd-invoke, and I saw that it was invoking nvidia-suspend.service with “start” as an argument, which felt weird.

I then looked at where anything pertaining that service in the source packages (deb source packages), and found:

525:
dh_systemd_enable --name=nvidia-suspend
535:
dh_installsystemd --name=nvidia-suspend

Now I’m guessing, since I haven’t dug into this bit:

dh_systemd_enable - I’m assuming it enables a service, i.e. next boot it will autostart.
dh_installsystemd - I’m assuming it enables it … and starts it?

I compared the 525 and 535 versions of the service files and they are identical, same hash.
They call /usr/bin/nvidia-sleep.sh with the argument “suspend”.
/usr/bin/nvidia-sleep.sh is also identical on both, so it has to be the calling.

Well…

Then I tried the following, on a kubuntu (to get 525 driver):

sudo add-apt-repository -y ppa:graphics.drivers && sudo apt -y update && sudo apt -y dist-upgrade && sudo apt -y install nvidia-driver-535 && sudo /usr/bin/nvidia-sleep.sh resume

And lo and behold! Screen blanks and after some 45 seconds it comes back. I had to Alt-F1 to get back to graphical mode, but everything was alive (apart from KDE complaining that it lost graphics).

Then I tried an Ubuntu (535.86 by default), using almost the same line (without the install nvidia-driver-535), and again it worked. I did it twice actually, the first time it came back to GUI mode and second I had to Alt-F1 to get back there, but hey.

So if you’re using command line to update your system, this is what you need to do. I’m going to contact the maintainer and ask him if I’m correct about the change the rules file or not.

Hope this helps anyone.

// Stefan

1 Like

I’ve been seeing this issue for a good 2-3 months now. The screen goes black when I install the drivers. I let the box sit for a while in case the drivers are still installing and they do appear to finish. Today, I started my driver updates on all 3 Linux boxes in a screen session… and when the screens went blank, I ssh’ed in and they finished.

The even more annoying part of the problem (at least for me) is that when I reboot, all 3 systems become stuck. I can still ping them, but I can’t ssh in, and there’s no screen. So I have to hard reboot them and they come up fine.

After installation, but before I’ve rebooted, I’ve noticed a Nvidia modeset process is stuck using 100% CPU. It’s the same on all 3 boxes, but I don’t remember which one it is for sure, but I think it’s this process:
[nvidia-modeset/kthread_q]

The only other possible one is:
[nvidia-modeset/deferred_close_kthread_q]

IIRC, the brackets around the name means it’s specifically a kernel thread.

Anyway, this happens to me on my desktop, my son’s computer, and my laptop. On the laptop, I don’t think the screen goes black (it’s using optimus), but I get the hanging kernel thread and the reboot gets stuck. I can’t remember for sure on the laptop today, because I did the Nvidia driver update via ssh. :D

Hopefully that helps.

1 Like