Low GPU utilization and slow inference on Win11 compared to Linux

Ponpon · December 9, 2023, 7:02pm

The GPU utilization is about only 50% when running inference on WIn11 (including WSL), and the inference time is 2X longer; whereas GPU can be fully utilized running the same model on Linux (more than 90%)

The environment:

OS: WIN11 and WSL
RTX 4060ti 16g as an eGPU
Cuda version:12.3

GPU utilization when running inferencing on WIN11 or WSL:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.33.01              Driver Version: 546.29       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    On  | 00000000:01:00.0  On |                  N/A |
| N/A   37C    P8               4W /  60W |   1413MiB /  4096MiB |     53%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 4060 Ti     On  | 00000000:06:00.0 Off |                  N/A |
|  0%   47C    P2              57W / 165W |   5329MiB / 16380MiB |     48%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

I tried this with a text-to-audio model bark and another LLaMA model. Any ideas about what might cause this?

Topic		Replies	Views
Windows 24H2 update causes slow inference / windows 24H2更新导致推理慢 CUDA Programming and Performance cuda	0	59	April 9, 2025
GPU usage high on Win11 but not on Win10 CUDA Programming and Performance	5	1287	March 16, 2023
High GPU usage during simple model inference on Tensorrt TensorRT cudnn	1	14	May 2, 2025
Slow real-time inference using WSL2 CUDA on Windows Subsystem for Linux cuda , performance , wsl	1	2853	August 3, 2022
Nvidia-smi isn't logging GPU usage when MPS is enabled CUDA Programming and Performance	2	1088	August 21, 2019
Estimating inference and training time of a neural network on GPU Maxine	2	2655	February 5, 2022
Running llama-2-13b for inferencing in Windows 11 WSL2 resulted in `Killed`: GPU barely used CUDA on Windows Subsystem for Linux	0	987	December 25, 2023
Higher performance when display is connected to dGPU instead of iGPU CUDA Setup and Installation cuda , onnx	0	794	April 3, 2023
GPU performance is very low on Ubuntu installed on a laptop Linux cuda , linux	0	715	November 2, 2020
GPU is slower than CPU on my wsl CUDA on Windows Subsystem for Linux	2	1333	September 9, 2020

Low GPU utilization and slow inference on Win11 compared to Linux

Related topics