_raw_q_flush NULL deref in nvidia_uvm during suspend on GM108M / 580.159.03 (kernel 7.0.x)

Summary

Kernel NULL-pointer dereference inside nvidia_uvm triggered by the proprietary
driver’s own suspend path (nv_uvm_suspenduvm_suspenduvm_suspend_entry
nv_kthread_q_flush_raw_q_flush+0x87) when the system enters S3/s2idle.
After the oops, nvidia-sleep.sh exits with IRQs disabled and preempt_count=1,
leaving kernel state corrupted. Subsequent threads that touch nvidia_uvm get
stuck in kernel space mid-do_exit() (state R, ignore SIGKILL), which in turn
prevents wait() reaping and produces immortal zombies for any process that
opened /dev/nvidia-uvm before the suspend.

Environment

Distro CachyOS (Arch-based, rolling)
Kernel 7.0.2-2-cachyos (PREEMPT, x86_64)
NVIDIA driver nvidia-580xx-dkms 580.159.03-2 (proprietary)
GPU NVIDIA GeForce 940MX (GM108M, Maxwell) — PCI 01:00.0
iGPU Intel HD Graphics 520 (Skylake-U), i915 (primary display)
Hardware Xiaomi Mi Notebook (Timi TM1613)
Firmware Dell BIOS A05 2016-08-11

Setup is Optimus-style: the NVIDIA GPU has no display attached; the iGPU drives
all outputs. The dGPU is essentially unused in normal workloads — no CUDA,
no PRIME offload, no Vulkan apps targeting it. The crash happens purely because
the driver is loaded and registers a suspend hook; the GPU itself has done
no meaningful work prior to the oops.

Module / power-management config (already correct)

/usr/lib/modprobe.d/nvidia-sleep.conf:
options nvidia NVreg_PreserveVideoMemoryAllocations=1 NVreg_TemporaryFilePath=/var/tmp

/etc/modprobe.d/nvidia.conf:
options nvidia-drm modeset=1
options nvidia NVreg_UsePageAttributeTable=1 NVreg_InitializeSystemMemoryAllocations=0

nvidia-suspend.service, nvidia-resume.service, nvidia-hibernate.service
are all enabled.

Trace

Full extract attached as oops.log. Key frames:

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP PTI
CPU: 3 UID: 0 PID: 322138 Comm: nvidia-sleep.sh
Tainted: P S OE 7.0.2-2-cachyos #1 PREEMPT
RIP: 0010:_raw_q_flush+0x87/0x110 [nvidia_uvm]
Code: ... 48 39 1e ...   <-- cmp %rbx,(%rsi) with RSI=0
RSI: 0000000000000000   RDI: ffffd393cde20318
Call Trace:
 _raw_q_flush+0x87/0x110          [nvidia_uvm]
 nv_kthread_q_flush+0x18/0x70     [nvidia_uvm]
 uvm_suspend+0x17b/0x1a0          [nvidia_uvm]
 uvm_suspend_entry+0xb6/0xf0      [nvidia_uvm]
 nv_uvm_suspend+0x32/0x50         [nvidia]
 nv_set_system_power_state+0x329/0x510 [nvidia]
 nv_procfs_write_suspend+0x129/0x160   [nvidia]
 proc_reg_write+0x68/0xb0
 __x64_sys_write+0x3f6/0x480
 do_syscall_64+0x6d/0xa90
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
note: nvidia-sleep.sh[322138] exited with irqs disabled
note: nvidia-sleep.sh[322138] exited with preempt_count 1

Right after the oops the kernel still attempts the suspend cycle and logs:

NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set.
System Power Management attempted without driver procfs suspend interface.
Please refer to the 'Configuring Power Management Support' section in the driver README.
nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5
nvidia 0000:01:00.0: PM: failed to suspend async: error -5

(NB: nvidia-suspend.service had successfully written to the procfs interface
just moments earlier — the message appears because the service was killed by
the oops mid-write, so subsequent retry tries without the procfs handshake.)

Reproduction

100% reproducible on this machine when:

  1. NVIDIA driver 580.159.03 is loaded with NVreg_PreserveVideoMemoryAllocations=1.
  2. An application that links FFmpeg/libmpv (compiled with --enable-nvdec --enable-nvenc --enable-cuda-llvm) is launched — in my case Stremio,
    which uses libmpv.so.2 + libavcodec.so.62 from system FFmpeg.
  3. System is suspended (systemctl suspend).

Triage details

I initially suspected QtWebEngine (Stremio embeds it). Triage rules it out:

  • Pre-launch baseline: nvidia_uvm blacklisted and not loaded
    (lsmod | grep nvidia_uvm → empty).
  • After running stremio (no video played, just opening the UI):
    nvidia_uvm is loaded with used by 2.
  • Same result with QTWEBENGINE_CHROMIUM_FLAGS="--disable-gpu" stremio.
  • Inspecting /proc/<stremio-pid>/maps after launch shows
    libcuda.so.580.159.03 mapped, and /proc/<stremio-pid>/fd/ lists open
    handles to /dev/nvidiactl, /dev/nvidia-uvm, /dev/nvidia0.
  • node (the Stremio server) does not map any NVIDIA libraries.
  • ldd /opt/stremio/stremiolibmpv.so.2, libavcodec.so.62,
    libavformat.so.62, libavutil.so.60, libavdevice.so.62 — system FFmpeg
    built --enable-nvdec --enable-nvenc --enable-cuda-llvm. FFmpeg/libmpv lazy-
    loads libcuda.so at init for capability probing, which is enough to load
    nvidia_uvm and stage the kernel for the later suspend oops.

So the “Stremio-only” trigger generalizes to: any app that links libmpv or
libavcodec from a CUDA-enabled FFmpeg build. The kernel oops itself is
independent of how nvidia_uvm got loaded — it just needs nvidia_uvm to be
loaded when the suspend hook runs.

Workaround that confirms the trigger

Blacklisting nvidia_uvm (plus overriding
/usr/lib/modules-load.d/nvidia-580xx-utils.conf with an empty file in
/etc/modules-load.d/) prevents the module from being loaded at boot or by
FFmpeg’s dlopen. With this in place, the same Stremio session followed by
systemctl suspend completes cleanly — no oops, no zombies, system resumes
normally.

When the bug does hit, the side effects are: oops in _raw_q_flush,
nvidia-suspend.service ends in failed (signal=KILL), nvidia-resume.service
then sticks in activating (start) indefinitely (in my case 10+ hours).

Aftermath

  • /sys/module/nvidia/parameters/ disappears (module in undefined state).
  • Any thread that dispatches into nvidia_uvm later can get stuck in kernel
    space during exit. Example: a ThreadPoolForeg worker thread of Stremio
    (PID 320117) is in state R with wchan: 0, has already released
    mm/fd/exe (so it’s mid-do_exit()), but never finishes — SigPnd: 0,
    nonvoluntary_ctxt_switches: 312717, ignores SIGKILL. Its parent (also a
    zombie) cannot be reaped by systemd --user, leaving immortal zombies until
    reboot.
  • System remains usable for non-GPU workloads but cannot suspend cleanly and
    any process touching the NVIDIA stack risks getting stuck.

Suspected cause

_raw_q_flush+0x87 decodes to cmp %rbx,(%rsi) with RSI=0, i.e. the queue
flush walks a list whose head/next pointer is NULL. Likely causes:

  • The internal nv_kthread_q was never initialized for this UVM context, or
  • It was already torn down (e.g. by a prior failed init / partial suspend), but
    uvm_suspend still calls nv_kthread_q_flush on it.

The Maxwell (GM10x) family is in deprecated/legacy support in the 580 branch;
this codepath may not be exercised regularly upstream.

Consistent with what the triage shows: an FFmpeg-based dlopen of
libcuda.so performs a CUDA capability probe that briefly opens
/dev/nvidia-uvm (causing nvidia_uvm to load) and then closes it. The
context teardown evidently leaves a partially-initialised nv_kthread_q
behind, which the later global uvm_suspend traverses and dereferences a
NULL list pointer in.

Workaround that does not work

The standard advice (enable nvidia-{suspend,resume,hibernate}.service and set
NVreg_PreserveVideoMemoryAllocations=1 + NVreg_TemporaryFilePath) is
already in place on this system — and it’s exactly the combination that
triggers the oops.

Workarounds that may help

  • Unsetting NVreg_PreserveVideoMemoryAllocations (drop video memory across
    suspend) — not yet tested, would avoid the uvm_suspend path entirely.
  • Blacklisting nvidia_uvm if CUDA is not needed.
  • For headless-NVIDIA Optimus laptops where the iGPU drives the display:
    blacklisting all NVIDIA modules (and relying on i915 only).

Note: blacklisting nvidia_uvm requires two changes

The nvidia-580xx-utils package ships
/usr/lib/modules-load.d/nvidia-580xx-utils.conf which contains
nvidia-uvm — i.e. it asks systemd-modules-load.service to load
nvidia_uvm at boot, explicitly. A blacklist nvidia_uvm line in
/etc/modprobe.d/ does not prevent this, because blacklist only blocks
implicit / autoload requests, not explicit ones from modules-load.d.

To actually keep nvidia_uvm unloaded, both of these are needed:

# /etc/modprobe.d/blacklist-nvidia-uvm.conf
blacklist nvidia_uvm
# /etc/modules-load.d/nvidia-580xx-utils.conf
# (empty — overrides /usr/lib/modules-load.d/nvidia-580xx-utils.conf)

It would be friendlier if nvidia-uvm were loaded lazily (on first
/dev/nvidia-uvm open) instead of being force-loaded at boot, especially on
Optimus systems where the dGPU is rarely used and the only effect of having
nvidia_uvm loaded is risk exposure to bugs like this one.

Attached

  • oops.log — kernel ring buffer covering the oops + suspend cycle.

  • full-boot.log — full

    oops.log (13.2 KB)

    nvidia-bug-report.log.gz (832.3 KB)

    full-boot.log (298.0 KB)

    journalctl -k -b of the affected boot.

  • nvidia-bug-report.log.gz — generated with sudo nvidia-bug-report.sh
    on the same machine after a clean reboot.