Hi @kwenutechnology — posting diagnostic data from the same HP OMEN class (Ryzen AI + Blackwell mobile) that may help isolate the root cause. The freeze you’re seeing looks like the visible symptom of a deeper runtime power management failure on the dGPU itself.
My system
- Laptop: HP OMEN MAX Gaming Laptop 16-ak0xxx, BIOS F.06 (2025-12-09)
- CPU / iGPU: AMD Ryzen AI 7 350 with Radeon 860M (Krackan)
- dGPU: NVIDIA GeForce RTX 5070 Ti Laptop GPU (GB205, Blackwell), VBIOS 98.05.31.00.38, 12 227 MiB VRAM
- OS: Fedora Linux 44 Workstation
- Kernel: 7.0.9-205.fc44.x86_64 (stable, not -rc)
- Driver: NVIDIA Open Kernel Module 595.71.05 (RPMFusion
akmod-nvidia) — note that on Blackwell, the proprietary driver is unsupported by design, so the open module is the only branch that runs
- Display: GNOME 48 on Wayland, PRIME render-offload (internal panel driven by the AMD iGPU;
nvidia-drm.modeset is at its default 0, not set on cmdline)
What I observe (a deeper hang than yours)
In the past 4 days I’ve had 4 hard freezes that did not recover with Ctrl+Alt+F3 → restart gdm — the whole system locks up (no input, no SysRq response, hard power-off required). Two distinct flavors:
- Cold freeze under sustained CUDA load — typically after a few hours of
ollama server + a CUDA-using GUI app keeping a CUDA context. The journal stops mid-write, no kernel oops, no MCE, nothing.
- Freeze on
s2idle suspend — systemd-logind triggers suspend, then gnome-shell starts timing out releasing input devices and never comes back.
If Ctrl+Alt+F3 still works for you, your hang may be limited to the display server level — mine is deeper (firmware lock).
Root cause I identified (verified by direct measurement)
The dGPU never enters runtime suspend after boot, even with power/control=auto and no CUDA process holding a context. Direct sysfs accounting on the current (stable) boot:
$ cat /sys/bus/pci/devices/0000:c2:00.0/power/runtime_active_time
17500659 # ms ≈ 4 h 51
$ cat /sys/bus/pci/devices/0000:c2:00.0/power/runtime_suspended_time
0 # ms — NEVER suspended since boot
$ cat /sys/bus/pci/devices/0000:c2:00.0/power/control
auto
$ cat /sys/bus/pci/devices/0000:c2:00.0/power/runtime_status
active
For comparison, the AMD iGPU on the same uptime: 86 413 ms active / 17 340 449 ms suspended (≈ 99.5 % asleep). So the kernel runtime PM stack works correctly on the same machine — the dGPU specifically is wedged active.
The visible symptom: dmesg | grep 'Enabling HDA controller' returns 50 to 148 occurrences per boot during the sessions that ended with a freeze, all at a regular 24-36 s interval. Each entry is a blocked D3hot->D0 transition attempt; each blocked attempt re-enables the HDA child function of the GPU. Eventually the GSP firmware appears to deadlock (consistent with the assertion path reported in upstream Issue #1064, although my journal does not contain the corresponding log lines — see below).
How to reproduce (in my setup, ~2-6 h to freeze)
- Boot HP OMEN MAX 16 with Fedora 44 + RPMFusion
akmod-nvidia-595.71.05.
- Log into a regular GNOME / Wayland session. Just by EGL/GBM device enumeration,
gnome-shell, nautilus, ptyxis, gnome-text-editor, brave open persistent handles on /dev/nvidia* (visible via fuser -v /dev/nvidia*).
- Start any CUDA workload that keeps a context alive — e.g.
systemctl start ollama (or any background LLM/inference process).
- Use the laptop normally. Periodically check:
cat /sys/bus/pci/devices/0000:c2:00.0/power/runtime_suspended_time stays at 0 from boot.
journalctl -k | grep -c 'Enabling HDA controller' grows linearly (≈ 1-3 hits/min).
- After a few hours under load, or on the next
systemctl suspend/lid-close, the system hard-hangs with no kernel error message.
Contributing factors I found
NVreg_EnableS0ixPowerManagement = 0 even though BIOS advertises Low-power S0 idle used by default for system suspend (so the NVIDIA driver doesn’t coordinate with platform S0ix).
EnableGpuFirmware = 18 (0x12 = ALLOWED | FORCE) + OpenRmEnableUnsupportedGpus = 1 — the GSP firmware is force-enabled for this GB205 SKU.
DynamicPowerManagement = 3 (most aggressive) but blocked by the persistent user-space handles on /dev/nvidia* mentioned in step 2 of the reproduction — even without any rendering happening on the dGPU.
- The RPMFusion
nvidia-suspend-nofreeze.conf drop-in forces SYSTEMD_SLEEP_FREEZE_USER_SESSIONS=false (a workaround inherited from the Xorg era) — on Wayland with active CUDA workloads this allows the CUDA-holding apps to keep emitting commands during the s2idle transition.
Cross-reference (complementary, not a duplicate)
The upstream GitHub project also has an open issue about the same firmware-level failure, but from the workstation Blackwell angle. I added a template-aligned diagnostic there with the same nvidia-bug-report.log.gz attached:
GSP heartbeat stuck at 0 since boot with S0ix power management on RTX PRO 1000 Blackwell laptop · Issue #1064 · NVIDIA/open-gpu-kernel-modules · GitHub
This thread and that issue cover different facets of the same root cause:
- GitHub #1064 — workstation Blackwell (RTX PRO 1000 on Dell + Intel iGPU), structured against the
10_functional_bug.yml template, exhaustive parameter listing, addressed to upstream Open Kernel Module developers.
- This thread — consumer Blackwell (RTX 5070 Ti on HP + AMD iGPU), focused on the user-visible symptom and the
Ctrl+Alt+F3 workaround you mentioned, addressed to fellow HP OMEN users and NVIDIA Linux forum staff.
Crucially, the combined evidence shows the bug isn’t tied to one OEM, one iGPU vendor, one driver minor version, or to NVreg_EnableS0ixPowerManagement = 0 vs 1 — it spans all of them.
To help triage:
- Does
Ctrl+Alt+F3 → systemctl restart gdm actually let you keep using the machine afterwards, or does the freeze come back quickly?
- Can you check, between two freezes, what
cat /sys/bus/pci/devices/<your dGPU BDF>/power/runtime_suspended_time reports? If it stays at 0 like mine, we have the same root cause and your gdm-restart workaround is patching the symptom (display server) but not the underlying firmware lock.
- Do you have any CUDA-using app running in the background (Ollama, a local LLM client, anything pinning a CUDA context)? That dramatically accelerates the freeze in my case.
Attached
nvidia-bug-report.log.gz captured on a stable boot — full driver state including /proc/driver/nvidia/params and dmesg.
Happy to run additional tests with toggled NVreg parameters (EnableS0ixPowerManagement=1, lower DynamicPowerManagement, EnableGpuFirmwareLogs cranked up, etc.) if NVIDIA staff drops a list here or on the GitHub issue.
nvidia-bug-report.log.gz (381.5 KB)