Xid 79, 154: GPU has fallen off the bus - GTX 1080 + Driver 580.119.02 - KDE Wayland

System info:

Operating System: Fedora Linux 43 (Workstation Edition)

Kernel Version: 6.18.10-200.fc43.x86_64

Desktop Environment: KDE Plasma (Wayland session)

Window Manager: KWin (Wayland)

Motherboard: Z170 Gaming K6 (BIOS P7.50 10/18/2018)

CPU: Intel(R) Core™ i7-7700K CPU @ 4.20GHz

GPU: NVIDIA GeForce GTX 1080 (Device ID: 10de:1b80)

GPU Connection: PCIe x16 slot (connected via 0000:00:01.0 bridge)

Driver Version: 580.119.02 (Installed via akmod-nvidia or similar RPM Fusion package, as indicated by akmods.service in the logs)

Installation Method: RPM Fusion (Akmods)

Description of the Problem:

While the system was running normally, the displays suddenly went black, the GPU fans spun up to 100% and I did a reset.

Steps to Reproduce:

The issue is not easily reproducible, as it occurred during general desktop usage with multiple applications open.

Relevant Logs:

system.log.txt (293.5 KB)

plasma-kwin_wayland.log.txt (48.6 KB)

I am filing this bug report as requested by the kwin_wayland error message. The system is stable otherwise, this hard crash happens very randomly every week or so. Are there any debugging steps I can take (e.g., capturing a full GPU core dump, trying different PCIe Gen settings in the BIOS) to help diagnose why the GPU is “falling off the bus”?

More crashes, more logs, more reporting:

log2.txt (125.0 KB)

In the words of an NV eng:

So check the mentioned probable causes.

1 Like

Thank you,
I am running the latest BIOS (7.5 2018 🥲)

The PSU is a 550 Watt, defiantly enough, is been working well for almost a decade.

Can’t completly rule out a thermal issue, but the crashes also happens when the GPU is under very little load, during the night, when the screens are off and only a few background services are running.

By the name of the error (GPU has fallen off the bus) I thought it might be a physical connection issue, but shaking the computer a bit does not reproduce the issue.
I am certain this type of crashes started happening after a diver update/change.

I see: sorry I couldn’t help. Maybe some Nvidia engs will be able to do more.

It will be probably a useful info to state what was the last version that was stable for you.

1 Like

I can confirm a very similar issue on nearly identical hardware. Adding my data point here for NVIDIA engineers.

System Info:

  • OS: Ubuntu 24.04.4 LTS (fresh install)
  • Kernel: 6.17.0-14-generic (HWE kernel from noble-updates/main)
  • GPU: NVIDIA GeForce GTX 1080 (GP104, Device ID: 10de:1b80) — same as OP
  • Driver: 580.126.09 (proprietary, installed via linux-generic-hwe-24.04)
  • CPU: Intel Core i7-6700 (Skylake)
  • Chipset: Intel Z170 (100 Series/C230 Series)
  • Motherboard PCIe topology: GPU on 01:00.0 via 00:01.0 bridge — same as OP
  • RAM: 64GB DDR4
  • PSU: 650W, system has been stable for years on older OS installs

Symptoms:
Complete hard system freeze — no display output, no SSH response, no Magic SysRq key response. The kernel is entirely locked up. Only a hard power-button hold recovers the system.

Unlike your case, no Xid errors appear in the journal — the log simply stops dead mid-operation with no warning. The last logged messages are normal application output (terminal emulator, cron jobs). There is no kernel panic, no GPU error, no OOM. The journalctl -b -1 -p 0..3 output shows only unrelated ntfs3 MFT warnings.

# Last lines before freeze — completely normal, then nothing:
<timestamp> systemd[1]: Finished sysstat-collect.service
<+11s>     ghostty[5252]: warning: failed to activate the on-screen keyboard
# === hard freeze here, no further log entries ===

Key observations:

  1. Ubuntu 24.04.4 was freshly installed with the linux-generic-hwe-24.04 metapackage, which pulled in kernel 6.17.0-14. Hard freezes began within a day of installation and have recurred roughly every 1-2 days since. The system has only ever booted kernel 6.17 on this install.
  2. The freeze happens during normal desktop usage, not under heavy GPU load.
  3. GPU temperatures at idle are ~45°C, CPU ~50°C — thermals are not a factor.
  4. No pcie_aspm or intel_idle.max_cstate kernel parameters are currently set.
  5. The 580 driver series is the last to support Pascal (GTX 1080) — the 590 branch drops Pascal entirely. Older branches (535, 570) still support this GPU and were stable on previous installs.

What I’m going to test:

  • Adding pcie_aspm=off as a kernel boot parameter to prevent PCIe power state transitions that could cause the GPU to fall off the bus.
  • Downgrading to kernel 6.14.0-37-generic or 6.8 (available in Ubuntu repos) while keeping driver 580.126.09.

@CocolinoFan — our setups are strikingly similar (same GPU, same Z170 chipset, same PCIe bridge path, both on 580.x drivers). The main difference is your kernel is 6.18 and mine is 6.17 — both very recent. You mentioned “I am certain this type of crashes started happening after a driver update/change” — I can corroborate: the hardware itself has worked fine for years under previous OS installs; this issue appeared only with the 580 driver + recent kernel combination.

I would appreciate if an NVIDIA engineer could clarify whether the 580 driver series has been tested against kernels 6.17+ with Pascal (GP104) hardware specifically, and whether there are known PCIe ASPM interaction issues with this generation.

1 Like

From the logs attached there is something interesting that hasn’t been raised yet.

The kwin_wayland log shows GL errors starting more than an hour before the crash:

  • 22:10:54libinput reports event processing lagging behind (135 ms)

  • immediately followed by GL_INVALID_VALUE and GL_FRAMEBUFFER_INCOMPLETE_ATTACHMENT

  • then hundreds of GL_INVALID_OPERATION errors (“Target doesn’t match the texture’s target”)

Later a second wave appears:

  • 22:26:05 – more GL errors ending with
    Could not delete texture because no context is current

The compositor’s GL state was already degraded long before the Xid 79 occurred.

At the moment of the crash the kernel log shows:

snd_hda_intel: Unstable LPIB

The snd_hda_intel device here is the GPU’s HDMI/DP audio controller (a PCIe function on the same device).

Seeing that error at the exact second as the Xid suggests the GPU stopped responding at the PCIe level, not just inside the graphics stack.

There are also a couple firmware-level PCIe hints in the boot log:

ACPI FADT declares the system doesn't support PCIe ASPM

This means the BIOS retains control of PCIe power management.

Suggestions

  1. Disable PCIe power management from the kernel side to remove one variable

Add to the kernel parameters:

pcie_aspm=off pcie_port_pm=off

Older Z170 firmware sometimes reports incomplete ASPM capability tables.

  1. Check if the crash is Wayland-specific

From the SDDM login screen choose:

Plasma (X11)

If the issue disappears under X11 that would point toward a Wayland compositor + Pascal + 580 driver interaction.

  1. Verify PCIe link health independently of the compositor

You can test the PCIe transport path directly:

git clone https://github.com/parallelArchitect/gpu-pcie-diagnostic

This measures negotiated PCIe link generation/width and bidirectional DMA bandwidth under load.

If the link tests clean, the issue is more likely inside the compositor/driver path rather than a hardware link problem.

1 Like