I encountered the following issue. When enabling the MPS (Multi-Process Service) server for the GPU (regardless of which GPU is used), and trying to get the SM utilization, all monitoring tools I tried only show the numbers for the nvidia-cuda-mps
process itself. Meanwhile, for the processes that are using the MPS server, the SM utilization shows either dashes or 0%.
I have tried the following tools: nvidia-smi, dcgm exporter, dcgm CLI utility tool (from the datacenter-gpu-manager package), and nvitop. However, using these same tools, I can successfully obtain the GPU memory usage for each process. But getting SM utilization has not been possible.
Effectively, when multiple processes are using MPS, their impact on SM utilization becomes a black box to me.
Community, has anyone faced this issue before and perhaps knows what the problem might be? Any help would be greatly appreciated!