The graphical card stops working during normal use. The frequency is about once a day. The problem happens randomly and usually only with Xorg and Firefox using the GPU and there is no intensive job running.
When the problem happens, the system will get completely stuck in a few seconds, I cannot even ssh into it, and the only thing I can do is a hard reset.
The system is stuck too fast so there’s no time to finish running nvidia-bug-report.sh. However, I managed to run nvidia-debugdump -D before the system is completely down, and I attached the output as dump.zip. I hope you can help me identify the problem.
I also attached the result of nvidia-bug-report.sh when the system is normally operating to help you collect some information about the hardware.
nvidia-bug-report.log.gz (650.2 KB)
dump.zip (471.5 KB)
NVRM: GPU at PCI:0000:02:00: GPU-6ed7643c-a601-1fff-8f7e-eb4eb148a6f0
NVRM: Xid (PCI:0000:02:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
NVRM: GPU 0000:02:00.0: GPU has fallen off the bus.
NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.