This suddenly started happening - very repeatedly.
Drivers were installed two weeks ago. Crashes started happening today. No HW changes.
Happens with or without load.
[ 7014.746693] tun: Universal TUN/TAP device driver, 1.6
[ 7014.747016] br0: port 2(vnet0) entered blocking state
[ 7014.747017] br0: port 2(vnet0) entered disabled state
[ 7014.747046] device vnet0 entered promiscuous mode
[ 7014.747155] br0: port 2(vnet0) entered blocking state
[ 7014.747156] br0: port 2(vnet0) entered forwarding state
[ 7014.946451] L1TF CPU bug present and SMT on, data leak possible. See CVE-2018-3646 and https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html for details.
[ 7151.601927] NVRM: GPU at PCI:0000:01:00: GPU-e9ab817b-191c-2aec-03b4-4d1b3a7883b3
[ 7151.601932] NVRM: GPU Board Serial Number:
[ 7151.601934] NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
[ 7151.601939] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[ 7151.601940] NVRM: GPU is on Board .
[ 7151.601950] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[ 7151.601977] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
[ 7292.640054] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000987d:0:0:0x0000000f
[ 7292.640064] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000917e:0:0:0x0000000f
[ 7292.640072] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
Crashed again, does not appear to be thermal related. Was 60+ C while I was playing a game, I left to do something else and came back to a crash.
==============NVSMI LOG==============
Timestamp : Mon Sep 17 09:10:46 2018
Driver Version : 390.87
Attached GPUs : 1
GPU 00000000:01:00.0
Temperature
GPU Current Temp : 53 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 96 C
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
==============NVSMI LOG==============
Timestamp : Mon Sep 17 09:10:49 2018
Driver Version : 390.87
Attached GPUs : 1
GPU 00000000:01:00.0
Temperature
GPU Current Temp : GPU is lost
GPU Shutdown Temp : GPU is lost
GPU Slowdown Temp : GPU is lost
GPU Max Operating Temp : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
dmesg output:
[ 5136.575433] NVRM: GPU at PCI:0000:01:00: GPU-e9ab817b-191c-2aec-03b4-4d1b3a7883b3
[ 5136.575436] NVRM: GPU Board Serial Number:
[ 5136.575437] NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus.
[ 5136.575440] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[ 5136.575440] NVRM: GPU is on Board .
[ 5136.575447] NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
[ 5137.519220] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
Hi dane.buson,
What game are you playing? How long need to play the same to hit this issue? Is there any custom setting in-game settings? What is the game and desktop resolution? What Desktop environment you are running - kde, gnome, xfce or else? Are the desktop effects enabled? Do you have any other system to test? See if you can repro with other GPUs too. Also is this issue hit in spefic MAP in the game and specific action in game?
>> It has occurred about 8 times today. I can see if I still have a 970 I can swap in next time it crashes.
So I good to check if its GPU or another hardware issue. Also try with different nvidia driver version to check if its driver issue. It good to contact GPU vendor to check GPU hardware issue.
>> This is not game related.
Can you please find out what activities hit this issue?