Hi,
I am using MPS with multiple GPUs. One thing I noticed is that client applications are all assigned to GPU-0, while other GPUs are idling.
here is nvidia-smi outputs:
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1911 C nvidia-cuda-mps-server 29MiB |
| 0 2742 M+C python 707MiB |
| 0 2806 M+C python 707MiB |
| 0 2872 M+C python 707MiB |
| 0 2946 M+C python 707MiB |
| 0 3031 M+C python 707MiB |
| 0 3124 M+C python 707MiB |
| 1 1911 C nvidia-cuda-mps-server 29MiB |
| 2 1911 C nvidia-cuda-mps-server 29MiB |
| 3 1911 C nvidia-cuda-mps-server 29MiB |
+-----------------------------------------------------------------------------+
I have tried set_active_thread_percentage
command to set the percentage to be 5. I expect to see the some worker thread will be assigned to other GPUs.
Any idea why this happened?