NVIDIA driver 510.39.01 crash with Ubuntu 21.10 and NVIDIA GeForce RTX 3060 Ti radomly crashes with this in the syslog:
Jan 20 21:58:43 theBeast kernel: [ 4722.854311] pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:01.0
Jan 20 21:58:43 theBeast kernel: [ 4722.854318] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Jan 20 21:58:43 theBeast kernel: [ 4722.854321] pcieport 0000:00:01.0: device [8086:460d] error status/mask=00100000/00010000
Jan 20 21:58:43 theBeast kernel: [ 4722.854322] pcieport 0000:00:01.0: [20] UnsupReq (First)
Jan 20 21:58:43 theBeast kernel: [ 4722.854324] pcieport 0000:00:01.0: AER: TLP Header: 34000000 01000010 00000000 00000000
Jan 20 21:58:43 theBeast kernel: [ 4722.854327] nvidia 0000:01:00.0: AER: can't recover (no error_detected callback)
Jan 20 21:58:43 theBeast kernel: [ 4722.854329] snd_hda_intel 0000:01:00.1: AER: can't recover (no error_detected callback)
Jan 20 21:58:43 theBeast kernel: [ 4722.854332] NVRM: GPU at PCI:0000:01:00: GPU-b69c7f51-dae2-883e-5a7b-e3629342967a
Jan 20 21:58:43 theBeast kernel: [ 4722.854336] NVRM: Xid (PCI:0000:01:00): 79, pid=0, GPU has fallen off the bus.
Jan 20 21:58:43 theBeast kernel: [ 4722.854338] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Jan 20 21:58:43 theBeast kernel: [ 4722.854348] pcieport 0000:00:01.0: AER: device recovery failed
Jan 20 21:58:43 theBeast kernel: [ 4722.854483] NVRM: GPU 0000:01:00.0: GPU serial number is <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF><FF><FF><FF><FF>.
Jan 20 21:58:43 theBeast kernel: [ 4722.854492] NVRM: A GPU crash dump has been created. If possible, please run
Jan 20 21:58:43 theBeast kernel: [ 4722.854492] NVRM: nvidia-bug-report.sh as root to collect this data before
Jan 20 21:58:43 theBeast kernel: [ 4722.854492] NVRM: the NVIDIA kernel module is unloaded.
The only way to reboot is then the reset button.
Processing: nvidia-bug-report.log.gz…