Nvlddmkm issue leading to BSOD during AI inference

mhosseini1 · October 10, 2025, 8:27pm

On a server, I have two NVIDIA L4 GPUs which are being used for AI model inference to perform anomaly detection in image data. Since August 22, 2025, a weird issue started happening. The server rebooted at random times in the middle of the inference job, without any noticeable increase in temperature, or memory utilization! On another computer with a Quadro RTX 4000 GPU, I never faced this issue, and even on weaker laptop GPUs I have no such issue.

When I tried to root cause the issue, all the sudden reboots (BSOD) were pointing to the nvlddmkm.sys module which is part of the NVIDIA driver for GPUs. Seems like a fault rooting from this module leads to the critical event ID 41 on WinServer 2025 (source: Microsoft-Windows-Kernel-Power).

Consulting the online forums, tried rolling back the driver and CUDA toolkit, but it didn’t fix the issue. I tried multiple things from online forums with no luck. I suppose this can have something to do with incompatibilities between Windows Server 2025, WSL 2, Docker Desktop, and NVIDIA drivers for datacenter GPUs (Tesla), but not sure why we didn’t have this issue before August 22!

Any thoughts or similar scenarios leading to a solution are really appreciated.

Topic		Replies	Views
The reboot occurred the system when the ‘nvidia-smi’ command is entered in debian Serrver Linux nvidia-smi	0	94	July 27, 2024
Rocky GeForce RTX 2080 SUPER nvidia rocky After installing nvidia and executing nvidia-smi, the server freezes and automatically reboots Linux	2	384	March 13, 2024
Nvlddmkm.sys cannot be found on computer Drivers - Linux, Windows, MacOS driver	5	335	June 9, 2025
Nvlddmkm.sys Blue Screen Error - Windows 2012 R2 Datacenter NVIDIA Virtual GPU Technology	5	15520	July 11, 2014
Fedora 27, occasional crash in NVidia driver on boot. Linux	2	2547	February 13, 2018
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running Drivers - Linux, Windows, MacOS ubuntu	1	1105	December 3, 2020
The reboot occurred the system when the 'nvidia-smi' command is entered Linux	0	860	July 8, 2020
vGPU enabled Windows 7 VM's BSOD NVIDIA Virtual GPU Technology	8	13422	March 2, 2016
"Failed to initialize NVML: Unknown Error" running nvidia-smi in a docker container only after some hours/days DGX Spark / GB10	29	626	January 27, 2026
vGPU Driver 331.59-332.83 BSOD NVIDIA Virtual GPU Technology	13	25209	October 23, 2014

Nvlddmkm issue leading to BSOD during AI inference

Related topics