🐞 Bug Report: RTX 4090 Inference Performance Regression on Windows 11 24H2
📌 Summary
When running GPT-SoVITS v2/v3 inference tasks using PyTorch with AMP (autocast, fp16) on RTX 4090 under Windows 11 Version 24H2, GPU utilization is abnormally low (<30%) and inference speed is significantly reduced. In contrast:
The same code and model run perfectly on Linux (Ubuntu).
The same GPU on Windows 11 23H2 performs normally after a downgrade.
Another machine with RTX 4070 + Windows 11 23H2 performs better than 4090 on 24H2.
This suggests that the issue is specifically tied to changes in Windows 11 24H2 and possibly how it interacts with NVIDIA drivers or cuDNN optimization logic.
📊 Test Configuration
Component Value
GPUs tested RTX 4090 (main issue), RTX 4070 (normal)
OS versions tested Windows 11 24H2 (problem), 23H2 (normal), Ubuntu 20.04/22.04 (normal)
PyTorch version 2.0.0 + cu118 (official pip version)
CUDA runtime 11.8 (also tested 12.1)
cuDNN version 8.7.0
Drivers tested 537.58, 551.61, 572.70, 572.83
Task type Inference only (no training)
Models tested GPT-SoVITS v2, v3
AMP Enabled (autocast + fp16)
Tools used nvidia-smi, Python logs, WebUI timing display
⚠️ Symptoms
On Windows 11 24H2 + RTX 4090:
GPU utilization is consistently low (~30%)
Inference is much slower than expected
On Windows 11 23H2:
Performance is fully recovered
GPU runs near 100% utilization
On Linux:
Everything performs as expected
On another machine with RTX 4070 + Windows 11 23H2:
Performs better than RTX 4090 on 24H2
✅ Workaround Confirmed
After a clean reinstallation using Windows 11 23H2 ISO, inference speed and GPU utilization on RTX 4090 returned to normal levels.
