Xid: 79, GPU has fallen off the bus (Arch linux, linux-ck-skylake 5.7.19, Nvidia 960, Driver: 455.23.04)

Oct 08 20:23:04 hwkiller-desktop kernel: NVRM: A GPU crash dump has been created. If possible, please run
                                         NVRM: nvidia-bug-report.sh as root to collect this data before
                                         NVRM: the NVIDIA kernel module is unloaded.
Oct 08 20:23:04 hwkiller-desktop kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
Oct 08 20:23:04 hwkiller-desktop kernel: NVRM: Xid (PCI:0000:01:00): 79, pid=797, GPU has fallen off the bus.
Oct 08 20:23:04 hwkiller-desktop kernel: NVRM: GPU at PCI:0000:01:00: GPU-0aa5ef0d-2a02-ee18-f14e-bbd9ebf50562

Hi all,
I have now experienced this crash three times since September; twice today.

Distro: Arch linux
Kernel: linux-ck-skylake
Driver: 455.23.04 (DKMS)
GPU: Evga Nvidia 960 4GB

It seemingly occurs out of the blue. Earlier, I was on a zoom call, and it crashed with that error. Later, it was 5 hours into a model fit [this did not use the GPU, but CPU was 100% on all cores].

Things I have noted:

  • I am not near ram capacity.
  • My temperatures are fine ( sensors | grep -i Core for CPU, and nvidia-smi for GPU, were ~80c and 50c respectively).
  • When stresstesting with stress -c 6 for the CPU and gpu_burn 360 for the GPU, no issues occurred. Temperatures were, again, around 80c for CPU and 77c for GPU

Since I experienced it for the second time, I reseated my GPU and its power plugs, and am trying another kernel (5.8.14.zen1). I will see whether this crash occurs again after the reseating. There was also a new nvidia update, so I updated to nvidia 455.28.

I am attaching the log below.
nvidia-bug-report.log.gz (826.9 KB)

Are there any steps I can or should take to diagnose this problem should it occur again?