RTX 5070 Ti Laptop GPU (ROG Strix G16, Linux Mint) – black screen + Nvidia-modeset error under CUDA load with 570-open/580-open

On my ASUS ROG Strix G16 laptop running Linux Mint, the system randomly hard-freezes and the screen goes black during deep learning training on the NVIDIA GPU.

When this happens, the display goes dark, I’m dropped to a black TTY-like screen (or nothing at all), and I see repeated NVIDIA-related errors. I cannot switch TTYs or recover; I have to force power off by holding the power button.

This only happens under heavy CUDA load (deep learning training). Normal desktop usage is fine.

Hardware :

  • Laptop: ASUS ROG Strix G16
  • dGPU: NVIDIA RTX 5070 Ti Laptop GPU
    • PCI ID: 10de:2f58
  • iGPU: AMD Raphael (amdgpu)
  • CPU: AMD Ryzen 9 8940HX
  • Hybrid graphics: AMD iGPU + NVIDIA dGPU (no external GPU)

Software :

  • OS : Linux Mint 22.2 zara
  • Kernel: 6.14.0-29-generic

NVIDIA drivers tested:

  1. nvidia-driver-580-open (recommended by Driver Manager)
  2. nvidia-driver-570-open

Display stack:

  • amdgpu for iGPU

  • nvidia (open kernel modules) for dGPU

During a deep learning training run on the dGPU, the laptop suddenly goes black. When it drops to a text console, I see errors like:

Nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress: 0x0000c67d:0 2:0:4048:4044

snd_hda_intel 0000:01:00.1: unable to change power state from D3cold to D0, device is inaccessible

From dmesg I see repeated messages like:

nvidia 0000:01:00.0: Enabling HDA controller
NVRM: The NVIDIA GPU 0000:01:00.0 (PCI ID: 10de:2f58)
NVRM: installed in this system requires use of the NVIDIA open kernel modules.
[drm:nv_drm_dev_load [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NvKmsKapiDevice

The GPU temperature during training stays around ~80°C (according to nvidia-smi) before the crash. There is no thermal throttling or obvious overheating.

After the crash, the system is completely unresponsive. The only way out is a hard power-off.

Current driver state:

$ lsmod | grep -E 'nvidia|nouveau'

nvidia_uvm           2076672  0
nvidia_drm            135168  0
nvidia_modeset       1638400  1 nvidia_drm
nvidia              104071168  2 nvidia_uvm,nvidia_modeset
nvidia_wmi_ec_backlight    12288  0
drm_ttm_helper         16384  2 amdgpu,nvidia_drm
video                  77824  5 nvidia_wmi_ec_backlight,asus_wmi,amdgpu,asus_nb_wmi,nvidia_modeset
wmi                    28672  5 video,nvidia_wmi_ec_backlight,asus_wmi,wmi_bmof,mfd_aaeon

$ lspci -k | grep -A 3 -E "VGA|3D"

01:00.0 VGA compatible controller: NVIDIA Corporation Device 2f58 (rev a1)
    Subsystem: ASUSTeK Computer Inc. Device 30f9
    Kernel driver in use: nvidia
    Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
--
69:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raphael (rev d8)
    Subsystem: ASUSTeK Computer Inc. Raphael
    Kernel driver in use: amdgpu
    Kernel modules: amdgpu
$ ls -l /dev/nvidia*

crw-rw-rw- 1 root root 195,   0 Nov 28 11:08 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Nov 28 11:08 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Nov 28 11:08 /dev/nvidia-modeset
crw-rw-rw- 1 root root 507,   0 Nov 28 11:08 /dev/nvidia-uvm
crw-rw-rw- 1 root root 507,   1 Nov 28 11:08 /dev/nvidia-uvm-tools

Is this a known issue with RTX 50-series laptop GPUs (PCI ID 10de:2f58) and the open kernel modules on Linux?

Are there recommended kernel parameters, driver versions, or power-management settings for ROG Strix G16 + RTX 5070 Ti Laptop on Linux to avoid:

  • Nvidia-modeset: ERROR: GPU:0: Error while waiting for GPU progress

  • snd_hda_intel 0000:01:00.1: unable to change power state from D3cold to D0

i have attached the nvidia-bug-report.sh report .

nvidia-bug-report.log.gz (346.2 KB)

Any guidance on how to stabilize this GPU under heavy CUDA workloads on Linux would be very appreciated.

Hi @13801380p

Would you mind trying our latest nvidia released driver 580.105.08 and share the feedback.

This is a follow-up after testing the 580.105.08 driver as requested.

System

  • Laptop: ASUS ROG Strix G16
  • dGPU: NVIDIA RTX 5070 Ti Laptop GPU
    • PCI ID: 10de:2f58
  • iGPU: AMD Raphael (amdgpu)
  • CPU: AMD Ryzen 9 8940HX
  • Hybrid graphics: AMD iGPU + NVIDIA dGPU
  • Secure Boot: disabled

What happens with 580.105.08 (.run installer)

I downloaded the official NVIDIA-Linux-x86_64-580.105.08.run file and installed it from a TTY with the display manager stopped.

After the install and reboot:

  • The system would not reach a normal graphical session.

  • I ended up with a broken graphics setup and had to:

    • remove all NVIDIA components,

    • reinstall the AMD userspace stack (xserver-xorg-video-amdgpu, libdrm-amdgpu1, mesa-vulkan-drivers),

    • boot temporarily with only the AMD iGPU.

At that point nvidia-smi no longer existed and Mint’s Driver Manager said that no proprietary drivers were needed. Basically the 580.105.08 runfile left my Mint installation in a non-functional hybrid GPU state, and I had to manually repair it.

I did not have this kind of system breakage before trying the 580.105.08 runfile.
Current state with distro packages

I am now back on the Ubuntu/Mint packaged driver:

nvidia-driver-580-open

installed via apt / ubuntu-drivers. With this setup:

  • nvidia-smi works and shows the RTX 5070 Ti.

  • CUDA workloads and desktop usage work again.

  • The original deep-learning crash I reported still happens on this driver.

Additional info about distro packages / Driver Manager

On Linux Mint’s Driver Manager, the only NVIDIA options I have are:

  • nvidia-driver-580-open (recommended) – version 580.95.05-0ubuntu0.24.04.2

  • nvidia-driver-570-open – version 570.195.03-0ubuntu0.24.04.1

  • xserver-xorg-video-nouveau (open-source)

There is no option for the non-open nvidia-driver-580 or any other proprietary driver in the Mint GUI; only the open kernel module drivers are offered. So on this system the “normal” way to install drivers via the distro is to use nvidia-driver-580-open (which I am currently using), not the closed driver.

Because of that, the only way I could test 580.105.08 as you requested was by using the official .run installer from NVIDIA’s website, which is what caused the broken graphics state I described earlier.

Is the 580.105.08 runfile driver expected to work on Linux Mint (Ubuntu 24.04–based), or is Mint considered unsupported even though it’s very close to Ubuntu?