Freeze with GPU has fallen off the bus on RTX 3080 16GB Laptop (AORUS 15P YD)

Pyrestone · February 25, 2022, 1:04pm

Hi,

I have an RTX 3080 Laptop (16GB) in an AORUS 15P YD laptop, in which I reliably get a “GPU has fallen off the bus” error some time after booting (sometimes on login screen, sometimes after a few - usually 0-20 - minutes after login) when the nvidia GPU is used for anything related to the X server or DRM.

First of all, I’m (mostly) excluding a hardware defect, as windows with the most recent nvidia drivers works stably and I had no issues so far.

Ideally, this should run on ubuntu 20.04 (which uses 5.13 kernel), although I have tried going up to 21.10, POP!OS (which uses 5.15), as well as down to 5.8 and 5.4.

The only thing I need to change from a stock Ubuntu 20.04 install is to install nvidia drivers.

I tried nvidia drivers 460, 470, 495, 510.47 and 510.54.

The nouveau driver works (mostly?) stable, although i’ve not tested this enough to say it with confidence.
Using nouveau is not really an option for me, as this is my work laptop and I require both OpenGL as well as CUDA to be runnable on the NVIDIA GPU.

I also tried all available BIOS versions available for the laptop, all of which work fine in windows, but the problem persists on linux.

I also tried disabling a bunch of power management features, such as D3 power management, or disabling pci_port_pm or pcie_aspm, and various options related to acpi, as I had suspected that the card is powered down or off while in use, which might cause the driver to crash.

I also noticed that the sound card is bound by an intel driver, which i found weird, but i don’t know if that might cause problems:

#output section from lspci -vvv after freeze

01:00.0 VGA compatible controller: NVIDIA Corporation GA104M [GeForce RTX 3080 Mobile / Max-Q 8GB/16GB] (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel driver in use: nvidia
	Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

01:00.1 Audio device: NVIDIA Corporation Device 228b (rev ff) (prog-if ff)
	!!! Unknown header type 7f
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

The only sort of “workaround” I have found so far for the nvidia GPU to not fall off the bus is when I use Compute mode from the system76-power library, which adds the following lines to /etc/modprobe.d/system76-power.conf:

# Automatically generated by system76-power
blacklist i2c_nvidia_gpu
blacklist nvidia-drm
blacklist nvidia-modeset
alias i2c_nvidia_gpu off
alias nvidia-drm off
alias nvidia-modeset off
options nvidia NVreg_DynamicPowerManagement=0x02
# Preserve video memory through suspend
options nvidia NVreg_PreserveVideoMemoryAllocations=1

interestingly, in Hybrid mode, which adds the following to /etc/modprobe.d/system76-power.conf:

# Automatically generated by system76-power
blacklist i2c_nvidia_gpu
alias i2c_nvidia_gpu off
options nvidia NVreg_DynamicPowerManagement=0x02
options nvidia-drm modeset=1
# Preserve video memory through suspend
options nvidia NVreg_PreserveVideoMemoryAllocations=1

the gpu still falls off the bus.
therefore I suspect that either nvidia-drm or nvidia-modeset is the failing component here, although that is more of a guess than anything else.

I have attached one nvidia debug report log. I can generate more if necessary. I don’t have a reliable method to immediately trigger the freeze, but i can make it happen within a reasonable timeframe.

Sorry for the wall of text, here’s a TLDR:
Error: GPU has fallen off the bus
Steps to reproduce:

Install Fresh ubuntu 20.04 or 21.10 or POP!OS 21.10 on AORUS 15P YD
Install NVIDIA drivers (e.g. sudo apt install nvidia-driver-510 or 470 or 460 or 495)
Reboot
Wait for crash
System freezes and is unresponsive, although ssh still works.
Check dmesg, find:

...
[   22.079296] audit: type=1400 audit(1645793344.680:44): apparmor="DENIED" operation="open" profile="snap.snap-store.ubuntu-software" name="/etc/PackageKit/Vendor.conf" pid=2247 comm="snap-store" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
[   30.848138] NVRM: GPU at PCI:0000:01:00: GPU-e5a2c765-97ab-76de-eaf8-021ea4ed93bc
[   30.848143] NVRM: Xid (PCI:0000:01:00): 79, pid=2835, GPU has fallen off the bus.
[   30.848146] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[   30.848190] NVRM: GPU 0000:01:00.0: GPU serial number is \xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff.
[   40.884545] Asynchronous wait on fence NVIDIA:nvidia.prime:2e6 timed out (hint:intel_atomic_commit_ready [i915])
[  100.671825] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000c67d:0:0:0x0000000f
...

I’d really appreciate any suggestions!
Thanks in advance!

nvidia-bug-report.log.gz (1.2 MB)

Pyrestone · February 25, 2022, 2:30pm

Based on recommendations on another post in this forum [Link], I have tried the liquorix kernel from here:
Liquorix Kernel (version 5.16.0-11.1 is currently the latest)
which has been stable for over 55 minutes now, which I think is a new record for me.

For anyone coming across this post, try that.

I will report back again here if it crashes.

system · March 11, 2022, 2:30pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
NVIDIA 515 - RTX 3060 - GPU has fallen off the bus Linux hw , nvbugs , kb	21	4794	March 15, 2025
RTX 3070 Ti falls off the bus on Razer Blade 15 2022 Linux	20	2434	October 24, 2023
Keep getting "GPU has fallen off the bus" with 3090 cards on Gigabyte MZ32-AR1 Rev 3.0 motherboard Linux gaming	18	235	June 10, 2025
3070 Lenovo Legion S7 GPU driver issue Linux ubuntu , linux	6	824	January 9, 2023
GPU has fallen off the bus issues on daily basis (RTX 4090) Linux pcie , cuda , ubuntu , rtx	9	1669	April 26, 2025
NVRM: Xid (PCI:0000:01:00): 79, GPU has fallen off the bus - HP Studio G5 Linux	39	10792	March 18, 2025
"GPU has fallen off the bus", Xid 79, RTX 3070 Linux boot , gaming	2	193	October 16, 2024
"GPU has fallen off the bus" in dGPU mode, AORUS 16X ASG, Mint 22 Linux kernel , ubuntu , driver , gaming , linux-driver	0	72	May 11, 2025
GPU has fallen off the bus Linux	1	977	September 21, 2021
"GPU has fallen off the bus" while idle, only occurs when all displays powered off Linux	15	7994	March 15, 2025

Freeze with GPU has fallen off the bus on RTX 3080 16GB Laptop (AORUS 15P YD)

Related topics