Black screen appears after a while

After some time after booting machine, black screen appears but the system is actually up and running so I’am able to connect via ssh and collect info. It never happened before but after upgrading drivers I get it very often. Applying the firmware fixing DP issue helps a little and the system is able to work longer without blackscreenning but I still get it.

uname -a
Linux white-angel 5.3.18-59.30-default #1 SMP Tue Nov 2 07:01:22 UTC 2021 (ebb75ee) x86_64 x86_64 x86_64 GNU/Linux
cat /etc/os-release 
NAME="openSUSE Leap"
VERSION="15.3"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.3"
PRETTY_NAME="openSUSE Leap 15.3"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.3"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
inxi -Gx
Graphics:  Device-1: NVIDIA GA102 [GeForce RTX 3080 Ti] driver: nvidia v: 470.82.00 bus ID: 0b:00.0 
           Display: server: X.org 1.20.3 driver: nvidia unloaded: fbdev,modesetting,nouveau,vesa 
           resolution: <xdpyinfo missing> 
           OpenGL: renderer: llvmpipe (LLVM 11.0.1 256 bits) v: 4.5 Mesa 20.2.4 direct render: Yes 
sensors | grep temp
temp1:        +34.0°C  (low  = +127.0°C, high = +127.0°C)  sensor = thermistor
k10temp-pci-00c3
temp1:        +40.0°C  

The tail of the kernel log is:

[ 2391.593984] NVRM: GPU at PCI:0000:0b:00: GPU-73fd6ef1-d99c-be16-7eb3-b421160e1984
[ 2391.593986] NVRM: Xid (PCI:0000:0b:00): 79, pid=0, GPU has fallen off the bus.
[ 2391.593988] NVRM: GPU 0000:0b:00.0: GPU has fallen off the bus.
[ 2391.594059] NVRM: GPU 0000:0b:00.0: GPU serial number is \xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff.
[ 2391.594070] NVRM: A GPU crash dump has been created. If possible, please run
               NVRM: nvidia-bug-report.sh as root to collect this data before
               NVRM: the NVIDIA kernel module is unloaded.
[ 2393.172536] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2398.185149] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2398.188461] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2398.190531] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2398.193372] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2403.686939] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2408.689417] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2408.689668] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2408.689778] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2408.690264] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f
[ 2418.691411] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67e:0:0:0x0000000f

Bug report

nvidia-bug-report.log.gz (1.3 MB)

Hi Sergey

Did you ever get a fix for this issue? or did regular driver updates sort the problem out for you?

Thanks - J