NVIDIA driver 510.39.01 crash with Ubuntu 21.10 and NVIDIA GeForce RTX 3060 Ti

NVIDIA driver 510.39.01 crash with Ubuntu 21.10 and NVIDIA GeForce RTX 3060 Ti radomly crashes with this in the syslog:

Jan 20 21:58:43 theBeast kernel: [ 4722.854311] pcieport 0000:00:01.0: AER: Uncorrected (Non-Fatal) error received: 0000:00:01.0
Jan 20 21:58:43 theBeast kernel: [ 4722.854318] pcieport 0000:00:01.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Jan 20 21:58:43 theBeast kernel: [ 4722.854321] pcieport 0000:00:01.0:   device [8086:460d] error status/mask=00100000/00010000
Jan 20 21:58:43 theBeast kernel: [ 4722.854322] pcieport 0000:00:01.0:    [20] UnsupReq               (First)
Jan 20 21:58:43 theBeast kernel: [ 4722.854324] pcieport 0000:00:01.0: AER:   TLP Header: 34000000 01000010 00000000 00000000
Jan 20 21:58:43 theBeast kernel: [ 4722.854327] nvidia 0000:01:00.0: AER: can't recover (no error_detected callback)
Jan 20 21:58:43 theBeast kernel: [ 4722.854329] snd_hda_intel 0000:01:00.1: AER: can't recover (no error_detected callback)
Jan 20 21:58:43 theBeast kernel: [ 4722.854332] NVRM: GPU at PCI:0000:01:00: GPU-b69c7f51-dae2-883e-5a7b-e3629342967a
Jan 20 21:58:43 theBeast kernel: [ 4722.854336] NVRM: Xid (PCI:0000:01:00): 79, pid=0, GPU has fallen off the bus.
Jan 20 21:58:43 theBeast kernel: [ 4722.854338] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Jan 20 21:58:43 theBeast kernel: [ 4722.854348] pcieport 0000:00:01.0: AER: device recovery failed
Jan 20 21:58:43 theBeast kernel: [ 4722.854483] NVRM: GPU 0000:01:00.0: GPU serial number is <FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF><FF>
<FF><FF><FF><FF><FF><FF>.
Jan 20 21:58:43 theBeast kernel: [ 4722.854492] NVRM: A GPU crash dump has been created. If possible, please run
Jan 20 21:58:43 theBeast kernel: [ 4722.854492] NVRM: nvidia-bug-report.sh as root to collect this data before
Jan 20 21:58:43 theBeast kernel: [ 4722.854492] NVRM: the NVIDIA kernel module is unloaded.

The only way to reboot is then the reset button.
Processing: nvidia-bug-report.log.gz…

Seems the log is stuck in the virus scanner. Please try reattaching.
From the log snippets, there’s a bus error, already tried reseating the crad in its slot, checked for a bios update, tried lowering bus speed?

Yes I have been trying to upload the nvidia-bug-report.log.gz several times but with no success so far? Shall I untar it?

Yes, please try unzipping and uploading the plain text file.

nvidia-bug-report.log (2.6 MB)
Here it is. Thanks

I guess you should rather contact Dell since this looks like a hardware issue. Is the device still under warranty?

Hello,
Yes thanks for your reply. The hardware is still under warranty yes.
I have upgraded the bios hoping this would help.
What makes you think it is a hardware issue?

The pcie bus errors and that the gpu is always falling off the bus with a XID79. Can have many causes but since you didn’t put it together yourself but bought it as a complete system, there’s no sense in finding out yourself. Just let Dell handle that.