Random Driver Kernel Crash - Linux - RTX2080Ti

During the last two/three months, my system running properly, I didn’t experience any cashes… but earlier, I’m randomly experiencing driver kernel module crashes when I’m doing trivial desktop works (nothing heavy or processing)!

I was copying a file in terminal, when I received a message from the kernel reporting disabling IRQ16, and dmesg showed the nvidia kernel crash though.

Several seconds after receiving the message, the system display is freeze, I connected to the machine when it still freezing through ssh to watch kernel message and generated the attached bug log.
nvidia-bug-report.log.gz (217 KB)

You’re getting

NVRM: Xid (PCI:0000:86:00): 79, GPU has fallen off the bus.

Possible reasons overheating, unstable/insufficient power supply. Monitor temperature using nvidia-smi, check/replace psu.

Thanks for your reply… I will keep an eye on temp and power, although I doubt that any of this can be the reason…

My server has two 2200w power supplies and the server is in a cold machine room.