Recently, my GPU suddenly stopped working. Along with the additional details in my prose below, I have attached the
nvidia-bug-report.log. I would also like to preface this with I believe this could very well be a hardware issue, but thought I would come here to see if there is anything I can try.
As the log states, I have an EVGA NVIDIA GTX 1080, and it has worked very well for years. Recently, it started occasionally giving ‘green screens’ where the normal output to a monitor would become intensely saturated with green. This seemed to go away with an unplug+replug or restarting the monitor. One day this happened again, but I was done using the computer so I just shut it down for the night by tapping the PC’s power button. The next day, I turned on the computer again, and got no output on any monitor. Oddly enough, after I believe the computer boots into an OS, the monitor would awaken from sleep just with a black screen. I will note that I never see anything - in BIOS, Linux or Windows. As the log states, I have an AMD without integrated graphics, so I have only been able to debug via ssh. I do have an old GPU (GTX 750ti) that I plugged swapped in, and everything works (BIOS, Windows, Linux). The 1080 does light up and run its fans. I recently (post-breaking) cleaned it and apply fresh thermal paste.
A couple notable snippets from the bug report:
Sep 03 19:40:49 cortana-man kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 511 Sep 03 19:40:49 cortana-man kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 510.68.02 Wed Apr 20 21:10:34 UTC 2022 Sep 03 19:40:49 cortana-man kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 510.68.02 Wed Apr 20 21:04:10 UTC 2022 Sep 03 19:40:49 cortana-man kernel: [drm] [nvidia-drm] [GPU ID 0x00000b00] Loading driver Sep 03 19:40:49 cortana-man kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:0b:00.0 on minor 0 Sep 03 19:40:49 cortana-man kernel: nvidia-uvm: Loaded the UVM driver, major device number 509. Sep 03 19:40:50 cortana-man kernel: NVRM: GPU 0000:0b:00.0: RmInitAdapter failed! (0x24:0x72:1417) Sep 03 19:40:50 cortana-man kernel: NVRM: GPU 0000:0b:00.0: rm_init_adapter failed, device minor number 0
[ 9.384] (II) NVIDIA GLX Module 510.68.02 Wed Apr 20 21:06:55 UTC 2022 [ 9.384] (II) NVIDIA: The X server supports PRIME Render Offload. [ 9.723] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA GPU at PCI:11:0:0. Please [ 9.723] (EE) NVIDIA(GPU-0): check your system's kernel log for additional error [ 9.723] (EE) NVIDIA(GPU-0): messages and refer to Chapter 8: Common Problems in the [ 9.723] (EE) NVIDIA(GPU-0): README for additional information. [ 9.723] (EE) NVIDIA(GPU-0): Failed to initialize the NVIDIA graphics device! [ 9.723] (EE) NVIDIA(0): Failing initialization of X screen
Thank you very much for your time,
nvidia-bug-report.log.gz (114.0 KB)