Ubuntu 23.04 + 525.105.17 Freezes

My computer has been intermittently freezing. When this happens the whole system locks up and I cannot switch to a console with Ctrl+Alt+F3 and I can’t SSH in. The last log entries are as follows:

2023-05-09T09:12:53.563917-04:00 bart kernel: [223066.497213] NVRM: GPU at PCI:0000:02:00: GPU-eb2fe0d0-53f5-9863-56ac-b5fb83d4334c
2023-05-09T09:12:53.563928-04:00 bart kernel: [223066.497231] NVRM: Xid (PCI:0000:02:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
2023-05-09T09:12:53.563930-04:00 bart kernel: [223066.497234] NVRM: GPU 0000:02:00.0: GPU has fallen off the bus.
2023-05-09T09:12:53.613927-04:00 bart kernel: [223066.547227] NVRM: A GPU crash dump has been created. If possible, please run
2023-05-09T09:12:53.613929-04:00 bart kernel: [223066.547227] NVRM: nvidia-bug-report.sh as root to collect this data before
2023-05-09T09:12:53.613930-04:00 bart kernel: [223066.547227] NVRM: the NVIDIA kernel module is unloaded.

I ran the nvidia-bug-report.sh after I rebooted.

I am having trouble attaching the bug report file

My Ubuntu 23.04 system is also freezing about once a day. I was able to ssh into and run nvidia-bug-report.sh while the display was frozen. I also couldn’t get TTY open.

This most recent crash was with driver version 525, however I did downgrade to 515 and experienced the same crashing behavior to test if the new driver version was the cause. It isn’t (as far as I can tell) so I went back to 525.

The syslog says the following (using dmesg)

[30588.035136] audit: type=1400 audit(1684343829.739:6355): apparmor="DENIED" operation="capable" class="cap" profile="/usr/sbin/cupsd" pid=88131 comm="cupsd" capability=12  capname="net_admin"
[30588.559775] audit: type=1107 audit(1684343830.267:6356): pid=1692 uid=103 auid=4294967295 ses=4294967295 subj=unconfined msg='apparmor="DENIED" operation="dbus_signal"  bus="system" path="/org/freedesktop/login1" interface="org.freedesktop.login1.Manager" member="PrepareForSleep" name=":1.3" mask="receive" pid=5066 label="snap.firefox.firefox" peer_pid=1810 peer_label="unconfined"
                exe="/usr/bin/dbus-daemon" sauid=103 hostname=? addr=? terminal=?'
[30588.560036] rfkill: input handler disabled
[30588.691189] e1000e 0000:00:1f.6 eno1: NIC Link is Down
[30591.981882] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[30591.981934] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
[30604.642985] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[30604.643044] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[30604.643076] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[30604.643106] [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership
[30737.325271] NVRM: GPU at PCI:0000:01:00: GPU-63761b65-f8a4-6b3e-59e2-7cce5ce7019a
[30737.325275] NVRM: Xid (PCI:0000:01:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
[30737.325277] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[30737.325282] NVRM: A GPU crash dump has been created. If possible, please run
               NVRM: nvidia-bug-report.sh as root to collect this data before
               NVRM: the NVIDIA kernel module is unloaded.

nvidia-bug-report.log.gz (713.7 KB)

A couple of days ago I read an article about how to install a specific version of the drivers on ubuntu. So I thought I would give it a try. It had me uninstall all traces of nvidia drivers, then install the version I wanted (525) using the command-line tools. It worked. I have not had the computer freeze for several days.

Here are the instructions I followed: https://www.linuxcapable.com/install-nvidia-drivers-on-ubuntu-linux/

Incidentally, I have not updated the nvidia drivers and have not had any crashes since posting this issue.
Maybe I ran sudo apt update and forgot, and everything was updated, but I am still using the same driver version (525.105.17)