The GPU utilization is about only 50% when running inference on WIn11 (including WSL), and the inference time is 2X longer; whereas GPU can be fully utilized running the same model on Linux (more than 90%)
The environment:
- OS: WIN11 and WSL
- RTX 4060ti 16g as an eGPU
- Cuda version:12.3
GPU utilization when running inferencing on WIN11 or WSL:
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.33.01 Driver Version: 546.29 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3050 ... On | 00000000:01:00.0 On | N/A |
| N/A 37C P8 4W / 60W | 1413MiB / 4096MiB | 53% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 4060 Ti On | 00000000:06:00.0 Off | N/A |
| 0% 47C P2 57W / 165W | 5329MiB / 16380MiB | 48% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
I tried this with a text-to-audio model bark and another LLaMA model. Any ideas about what might cause this?