Overview & System Environment
Hello everyone, I am an AI engineer conducting AI model inference tasks on a custom Windows desktop PC with NVIDIA GPU acceleration via WSL2 and Docker Desktop. I am encountering a very specific and persistent system lag issue that only occurs after the PC has been idle for several hours, and I need professional troubleshooting support from the official NVIDIA community.
My detailed stable hardware and software configuration is as follows:
-
Hardware Device: Custom desktop computer (not laptop)
-
CPU: Intel i9 high-performance processor
-
System RAM: 64GB DDR4/DDR5 sufficient memory
-
GPU: NVIDIA RTX 5060 Ti 16GB (for AI model inference workloads)
-
Host OS: Windows 11 Pro 25H2 (latest official release version)
-
Virtualization Environment: Docker Desktop running on Windows 11, based on WSL2 virtual machine
-
Container Deployment: Three independent Linux containers in total, including Nginx + frontend web service container, MySQL database container, and backend service + AI inference core program container. All containers are allocated with sufficient CPU, memory and GPU resources to meet long-term inference operation demands.
Detailed Problem Phenomenon
The fault performance presents obvious periodic and trigger-specific characteristics, with consistent recurrence rules as follows:
Right after restarting the Windows desktop PC, everything works perfectly normally. I can run AI inference tasks continuously and stably, the Windows system GUI responds smoothly, there is no stuttering or delay of any kind, and all hardware resource operation status is stable and normal.
However, if the PC is left idle for about 8 hours or longer with the Docker containers and AI inference program kept running in the background, a serious abnormal problem will occur: the Windows host system becomes extremely laggy and unresponsive, the desktop GUI almost fails to respond to mouse and keyboard operations, and basic system operations are severely stuck. Surprisingly, the AI inference task inside the WSL2 Docker container still runs normally and stably with no interruption or error.
The only two effective ways to restore normal state are: first, stop all ongoing AI inference tasks (the system will gradually return to smooth after inference ends); second, directly restart the Windows PC (the system and inference will work normally again after reboot). The problem will reappear again after another long period of idle standby.
All Completed Inspections & Optimization Settings
I have comprehensively checked system resources, driver versions, power management and WSL2 configuration excluding conventional resource bottlenecks and improper setting issues. The specific operations are all completed as follows:
-
Full resource monitoring inspection: Real-time checked the resource usage of Windows host system, WSL2 virtual machine and all Docker containers. The monitoring data shows GPU video memory usage, system physical memory usage are all within reasonable normal range; CPU usage rises slightly during AI inference but never reaches performance upper limit or full load, no resource exhaustion or resource contention problem exists.
-
Full driver version upgrade: Updated and installed the latest official NVIDIA graphics card drivers, motherboard chipset drivers and motherboard BIOS firmware to ensure all underlying hardware drivers are the newest official stable versions.
-
WSL2 resource limit configuration: Manually set and limited the available CPU core number and maximum memory capacity for WSL2 virtual machine by modifying the .wslconfig configuration file, to avoid WSL2 occupying excessive host resources.
-
System power management optimization: Carefully adjusted all Windows system and hardware power management plans, turned off all hardware automatic sleep, hibernation and low-power standby functions, to eliminate system anomalies caused by hardware sleep and wake-up switching.
-
DPC latency professional test: Tested system DPC latency with LatencyMon tool, the test result clearly shows that nvlddmkm.sys causes extremely long DPC execution time, which is the core inducement of system GUI lag and unresponsiveness.
-
Dual-type NVIDIA driver verification: Tried and installed both the latest NVIDIA Game Ready Driver and the latest Studio Driver respectively, the lag problem still exists in both driver versions with no improvement.
-
WSL2 sleep function disabled: Manually turned off the automatic sleep and idle suspension mechanism of WSL2 virtual machine to prevent background virtual machine suspension affecting host system performance.
-
WSL2 network mode modification: Added networkingMode=mirrored parameter to the .wslconfig configuration file to optimize WSL2 network compatibility and reduce system interaction conflicts.
-
Windows memory compression closed: Permanently disabled Windows system memory compression via the PowerShell commandDisable-MMAgent -MemoryCompression, the setting remains effective after system reboot to reduce system memory management overhead and potential performance conflicts.
-
NVIDIA App Power Management Tuning: Adjusted NVIDIA App global power management profile from default mode to Prefer Maximum Performance, turned off all GPU power-saving and dynamic frequency reduction strategies to keep GPU running at full performance state permanently.
-
Latest WSL2 Configuration Applied: Deployed the newest optimized .wslconfig file placed in the Windows master root directory, explicitly disabled all WSL2 automatic sleep and idle suspend mechanisms to prevent background virtual machine hibernation causing host system interaction anomalies.
-
Windows System Power Mode Full Optimization: Searched and adjusted Windows built-in power settings via Start Menu, completely turned off all system-level power-saving, automatic low-power and idle energy-saving functions to ensure the system maintains high-performance operation state at all times.
-
Advanced Power Plan Configuration via powercfg.cpl: Entered the advanced power settings panel through powercfg.cpl, disabled all energy-saving options for hardware including hard disk automatic sleep, CPU power throttling, and PCIe link power management, eliminated performance fluctuations caused by hardware power switching.
-
Hibernation Permanently Disabled: Executed the commandpowercfg /hibernate off in the system terminal to completely turn off Windows hibernation function, the configuration is persistent and still valid after reboot to avoid system residual hibernation cache affecting driver operation.
-
Full Driver Update via Official Assistant: Installed Dell Support Assist official tool, used it to scan and update all system underlying drivers, including the latest motherboard BIOS firmware and full chipset drivers, ensuring basic hardware driver compatibility is fully up-to-date.
-
Multi-version NVIDIA Driver Iterative Testing (Key Test Record): 1. Latest Studio Driver 596.36: system still has severe GUI lag issue; 2. Latest Game Driver 596.36: same lag problem with no improvement; 3. Studio Driver 595.79: lag issue persists after long-time running; 4. Driver 581.95 (pushed by Dell Support Assist for security vulnerability fix): installation failed and cannot be used normally; 5. Driver 581.57: ame lag problem with no improvement.
Problem Root Cause Preliminary Judgment & Invalid Fix Attempts
After all the above inspections and configuration optimizations, I have completely ruled out the problem of the AI inference application program itself. The core fault is preliminarily located at compatibility adaptation problem between NVIDIA graphics card driver and WSL2 virtualization mechanism, rather than application program code or container resource allocation failure.
I also tried the following common targeted solutions to fix the lag issue, but none of them worked at all, and the problem still recurs after long-term idle standby:
-
Completely closed all running Docker containers and Docker Desktop software, waited for the system to release all related resources, then kept the PC idle for observation, the lag problem still occurs as usual.
-
Executed the wsl --shutdown command to completely shut down the WSL2 virtual machine, then restarted Docker Desktop and all Docker containers to resume AI inference tasks. This operation also has no any repair effect, and the system lag still appears after several hours of idle time, which is very confusing for my troubleshooting.
Appeal & Questions for Help
I now need official professional guidance and targeted solution suggestions from NVIDIA technical experts and senior users. I want to confirm whether this is a known compatibility bug between the new version of NVIDIA driver, Windows 11 25H2 system and WSL2 Docker GPU inference scenario? Are there any available driver version rollback schemes, special registry configuration items, or WSL2 underlying parameter optimization methods to solve the high DPC latency of nvlddmkm.sys and the Windows system GUI lag problem caused by long-term idle standby?
Thanks a lot for any troubleshooting suggestions and technical support!