GPU has fallen of the bus

Hi!
The nvidia customer support chat referred me here,

I’ve been having quite a bit of trouble since early last year, but lacked the time to really dig into it. I thought it was my inexperience with linux, I didn’t quite know how to log the errors well at first, but after going through several linux distributions, carefully re-attaching the cables, switching to another pci, I’m starting to believe this issue might be faulty hardware. After not doing anything for many months, in the past weeks I’ve had more time again to figure out what is going on.

Here is some errors that happened today which caused a complete system crash:

kernel: NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus.
nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
– Reboot –

NVRM: Xid (PCI: 000:02:00): 79, GPU has fallen of the bus
– Reboot –

NVRM: 8, channel 00000010
– reboot

Please advise how to further troubleshoot and resolve this issue
Thank you

nvidia-bug-report.log.gz (570 KB)
temp.log (76.9 KB)

If this is a desktop system, XID 79 is hw related, insufficient power supply or overheating.
Please run nvidia-bug-report.sh as root and attach the resulting .gz file to your post. Hovering the mouse over an existing post of yours will reveal a paperclip icon.
https://devtalk.nvidia.com/default/topic/1043347/announcements/attaching-files-to-forum-topics-posts/

Thank you, I ran the script and attached it to the post. In the meantime I’ve double checked the cables and also moved the power plug from power strip directly to the socket just in case, but unfortunately I just had two more crashes.

mei 06 20:09:16 bleep-desktop kernel: NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus.
mei 06 20:09:27 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x
mei 06 20:09:27 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x
mei 06 20:09:27 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x
mei 06 20:09:27 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x
– Reboot –
mei 06 20:26:01 bleep-desktop kernel: aufs aufs_fill_super:916:mount[1182]: no arg
mei 06 20:26:01 bleep-desktop kernel: overlayfs: missing ‘lowerdir’
mei 06 20:26:07 bleep-desktop gnome-session-binary[1330]: Unrecoverable failure in required component org.gnome.Shell.desktop
mei 06 20:26:07 bleep-desktop gnome-session-binary[1580]: Unrecoverable failure in required component org.gnome.Shell.desktop
mei 06 20:26:24 bleep-desktop spice-vdagent[1986]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
mei 06 20:26:47 bleep-desktop spice-vdagent[2626]: Cannot access vdagent virtio channel /dev/virtio-ports/com.redhat.spice.0
mei 06 20:30:31 bleep-desktop kernel: NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus.
– Reboot –
mei 06 20:32:55 bleep-desktop kernel: aufs aufs_fill_super:916:mou

Power Supply is a Corsair PSU TX550M 550W

Idle temperature looks fine, to make sure, you can log temperatures using
nvidia-smi -q -d TEMPERATURE -l 2 -f temp.log
My guess would be a failing PSU, though.

Also, a 550W PSU is a bit small for a 1080ti.

Ok thank you for your feedback, I attached a log with the temparatures. Today I tested the various voltages from the psu to the motherboard with a multimeter and they are all consistent and within the acceptable range. After putting it back together I had another complete crash but this time it was not from the GPU. If a GPU crash happens again I’ll try borrowing or upgrading the PSU

Did another a clean Ubuntu install, after putting everything back together yesterday. After some initial crash, the logs didn’t show gpu issue so I was hoping the problem only had something to do with gnome, after setting wayLandEnable to False in gdm3/custom.conf I was able to login and use the system. Not much later it crashed again, output from journalctl --since yesterday -p 0…4

mei 11 21:47:59 bleep-desktop org.gnome.Shell.desktop[1947]: Window manager warning: last_focus_time (344931000) is greater than comparison timestamp (345675).  This most likely represen
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): connected
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): Internal TMDS
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): 600.0 MHz maximum pixel clock
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0):
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-1: disconnected
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-1: Internal DisplayPort
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-1: 1440.0 MHz maximum pixel clock
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0):
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-2: disconnected
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-2: Internal TMDS
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-2: 165.0 MHz maximum pixel clock
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0):
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-3: disconnected
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-3: 1440.0 MHz maximum pixel clock
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0):
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-4: disconnected
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-4: Internal TMDS
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-4: 165.0 MHz maximum pixel clock
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0):
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-5: disconnected
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-5: Internal DisplayPort
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-5: 1440.0 MHz maximum pixel clock
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0):
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-6: disconnected
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-6: Internal TMDS
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0): DFP-6: 165.0 MHz maximum pixel clock
mei 11 21:48:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (--) NVIDIA(GPU-0):
mei 11 21:48:05 bleep-desktop org.gnome.Shell.desktop[1947]: Window manager warning: last_focus_time (350859000) is greater than comparison timestamp (352087).  This most likely represen
mei 11 21:49:09 bleep-desktop kernel: NVRM: GPU at PCI:0000:02:00: GPU-3d67e6bf-cda0-31c6-0cd4-57f2da248bc1
mei 11 21:49:09 bleep-desktop kernel: NVRM: GPU Board Serial Number: 0324517092231
mei 11 21:49:09 bleep-desktop kernel: NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus.
mei 11 21:49:09 bleep-desktop kernel: NVRM: GPU at 00000000:02:00.0 has fallen off the bus.
mei 11 21:49:09 bleep-desktop kernel: NVRM: GPU is on Board 0324517092231.
mei 11 21:49:09 bleep-desktop kernel: NVRM: A GPU crash dump has been created. If possible, please run
                                      NVRM: nvidia-bug-report.sh as root to collect this data before
                                      NVRM: the NVIDIA kernel module is unloaded.
mei 11 21:49:11 bleep-desktop /usr/lib/gdm3/gdm-x-session[1796]: (WW) NVIDIA(0): WAIT (2-S, 17, 0x062b, 0x000080cc, 0x00008100)
mei 11 21:49:13 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
mei 11 21:49:13 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
mei 11 21:49:13 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
mei 11 21:49:13 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
mei 11 21:49:13 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
mei 11 21:49:13 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
mei 11 21:49:13 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
mei 11 21:49:14 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
mei 11 21:49:14 bleep-desktop kernel: nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f

Abnormal temperatures won’t happen due to lack of power unless the GPU is dying. In your case, the fault is likely lack of power feeding the GPU which ultimately means it requires higher power supply. An upgrade from 550 Watts to 650 Watts will do it. Moreover, in reality, if your PSU is wearing out, it won’t generate anything close to the advertised 550 Watts.
A quick remedy may be to artificially throttle the GPU clock which means lower power drain requirement.
Perhaps something like this may help you.

For now, also monitor the output of:

nvidia-smi -q -d  Power | grep Draw

and try to catch the power drain reading at which the GPU is crashing. This will give you a clue as to whether you actually need a 650Watts PSU or a brand new 550Watts PSU. A 600 Watts PSU may be enough as well.

Sometimes upgrading the system BIOS fixes this issue.

Will a microcode update help as well? I could not flash an updated BIOS image till I installed an evaluation copy of Windows on my desktop.

CPU microcode update will not help.

Bios is up-to-date. Just ordered a corsair 650 watt PSU, hope that will solve the issue

Installed the new Corsair 650 watt PSU last night and I did get another type of crash (nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000987d:0:0) , but I was initially happy it wasn’t falling of the busy, however today I just had another crash shortly after GPU fell of the bus.

mei 15 18:02:38 bleep-desktop systemd[1]: geoclue.service: Main process exited, code=killed, status=15/TERM
mei 15 18:02:42 bleep-desktop gnome-software-service.desktop[12593]: Unable to acquire bus name 'org.gnome.Software'
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): connected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): Internal TMDS
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): 600.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-1: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-1: Internal DisplayPort
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-1: 1440.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-2: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-2: Internal TMDS
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-2: 165.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-3: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-3: 1440.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-4: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-4: Internal TMDS
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-4: 165.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-5: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-5: Internal DisplayPort
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-5: 1440.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-6: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-6: Internal TMDS
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-6: 165.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): connected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): Internal TMDS
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): 600.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-1: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-1: Internal DisplayPort
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-1: 1440.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-2: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-2: Internal TMDS
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-2: 165.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-3: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-3: 1440.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-4: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-4: Internal TMDS
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-4: 165.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-5: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-5: Internal DisplayPort
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-5: 1440.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-6: disconnected
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-6: Internal TMDS
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0): DFP-6: 165.0 MHz maximum pixel clock
mei 15 18:02:43 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (--) NVIDIA(GPU-0):
mei 15 18:02:47 bleep-desktop org.gnome.Shell.desktop[8420]: Window manager warning: Buggy client sent a _NET_ACTIVE_WINDOW message with a timestamp of 0 for 0x3c0001a
mei 15 18:03:00 bleep-desktop kernel: NVRM: GPU at PCI:0000:02:00: GPU-3d67e6bf-cda0-31c6-0cd4-57f2da248bc1
mei 15 18:03:00 bleep-desktop kernel: NVRM: GPU Board Serial Number: 0324517092231
mei 15 18:03:00 bleep-desktop kernel: NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus.
mei 15 18:03:00 bleep-desktop kernel: NVRM: GPU at 00000000:02:00.0 has fallen off the bus.
mei 15 18:03:00 bleep-desktop kernel: NVRM: GPU is on Board 0324517092231.
mei 15 18:03:00 bleep-desktop kernel: NVRM: A GPU crash dump has been created. If possible, please run
                                      NVRM: nvidia-bug-report.sh as root to collect this data before
                                      NVRM: the NVIDIA kernel module is unloaded.
mei 15 18:03:05 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: nvLock: client timed out, taking the lock
mei 15 18:03:10 bleep-desktop gnome-shell[8420]: _cogl_buffer_gl_map_range: assertion 'data != ((void *)0)' failed
mei 15 18:03:10 bleep-desktop gnome-shell[8420]: g_error_free: assertion 'error != NULL' failed
mei 15 18:03:10 bleep-desktop gnome-shell[8420]: _cogl_buffer_bind_no_create: assertion 'ctx->current_buffer[buffer->last_target] != buffer' failed
mei 15 18:03:10 bleep-desktop gnome-shell[8420]: _cogl_buffer_gl_map_range: assertion 'data != ((void *)0)' failed
mei 15 18:03:10 bleep-desktop gnome-shell[8420]: g_error_free: assertion 'error != NULL' failed
mei 15 18:03:10 bleep-desktop gnome-shell[8420]: _cogl_buffer_bind_no_create: assertion 'ctx->current_buffer[buffer->last_target] != buffer' failed
mei 15 18:03:13 bleep-desktop /usr/lib/gdm3/gdm-x-session[7267]: (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x00000ee0, 0x00000ee8)
-- Reboot --

The only other thing I can imagine trying is to replace the cable from the psu to the gpu. What else can I do to determine the root cause? If it could be the gpu i’d like to figure that out while i’m still in warranty

another log from crash:

mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): connected
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): Internal TMDS
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): Medion MD 20430 (DFP-0): 600.0 MHz maximum pixel clock
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0):
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-1: disconnected
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-1: Internal DisplayPort
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-1: 1440.0 MHz maximum pixel clock
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0):
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-2: disconnected
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-2: Internal TMDS
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-2: 165.0 MHz maximum pixel clock
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0):
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-3: disconnected
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-3: Internal DisplayPort
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-3: 1440.0 MHz maximum pixel clock
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0):
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-4: disconnected
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-4: Internal TMDS
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-4: 165.0 MHz maximum pixel clock
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0):
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-5: disconnected
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-5: Internal DisplayPort
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-5: 1440.0 MHz maximum pixel clock
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0):
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-6: disconnected
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-6: Internal TMDS
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0): DFP-6: 165.0 MHz maximum pixel clock
mei 15 18:22:26 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (--) NVIDIA(GPU-0):
mei 15 18:22:32 bleep-desktop gnome-shell[1982]: Some code accessed the property 'discreteGpuAvailable' on the module 'appDisplay'. That property was defined with 'let' 
mei 15 18:22:44 bleep-desktop kernel: NVRM: GPU at PCI:0000:02:00: GPU-3d67e6bf-cda0-31c6-0cd4-57f2da248bc1
mei 15 18:22:44 bleep-desktop kernel: NVRM: GPU Board Serial Number: 0324517092231
mei 15 18:22:44 bleep-desktop kernel: NVRM: Xid (PCI:0000:02:00): 79, GPU has fallen off the bus.
mei 15 18:22:44 bleep-desktop kernel: NVRM: GPU at 00000000:02:00.0 has fallen off the bus.
mei 15 18:22:44 bleep-desktop kernel: NVRM: GPU is on Board 0324517092231.
mei 15 18:22:44 bleep-desktop kernel: NVRM: A GPU crash dump has been created. If possible, please run
                                      NVRM: nvidia-bug-report.sh as root to collect this data before
                                      NVRM: the NVIDIA kernel module is unloaded.
mei 15 18:22:47 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (WW) NVIDIA(0): WAIT (2-S, 17, 0x03c5, 0x0000d4e8, 0x0000d558)
mei 15 18:22:54 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (WW) NVIDIA(0): WAIT (1-S, 17, 0x03c5, 0x0000d4e8, 0x0000d558)
mei 15 18:22:57 bleep-desktop /usr/lib/gdm3/gdm-x-session[1826]: (WW) NVIDIA(0): WAIT (2-S, 17, 0x03c6, 0x0000d4e8, 0x0000d558)
-- Reboot --
mei 15 18:23:40 bleep-desktop kernel: secureboot: Secure boot could not be determined (mode 0)
mei 15 18:23:40 bleep-desktop kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
mei 15 18:23:40 bleep-desktop kernel: ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
mei 15 18:23:41 bleep-desktop kernel: usb: port power management may be unreliable

nvidia-bug-report.log.gz (1.12 MB)

New PSU means new cables, or did you re-use the old ones?
Remaining sources I can think of might be
bad pci slot
defective voltage regulators on the mainboard
defective gpu

Did you already change the slot? Or at least reseated it?
To rule out a faulty graphics card, you’d have to check it in a different system.

Thanks, appreciate your help with this. I changed the type 4 cable for the new one to be sure. did reseat the gpu before and also placed it in a different slot a while ago, but just now changed it back again to the other slot just to be sure. I’ll ask to see if I can test it with a friend during the weekend

I get this problem too, any more clues or progress?