Hi
I’ve configured a Kubernetes cluster on a MIG-enabled GPU node and partitioned the GPU into 4 distinct MIG instances, each with a different slice configuration (2g.20gb, 1g.10gb, 3g.40gb, 1g.10gb).
System Details :
Driver Version: 555.42.06
CUDA Version: 12.5
GPU: NVIDIA H100 80GB HBM3
| MIG Device | Slice Type | SMs | Memory |
|---|---|---|---|
| GPU-I 3 | 3g.40gb | 60 | 40 GB |
| GPU-I 2 | 2g.20gb | 32 | 20 GB |
| GPU-I 0 | 1g.10gb | 16 | 10 GB |
| GPU-I 1 | 1g.10gb | 16 | 10 GB |
I used the following DCGM command to collect GPU utilization metrics
dcgmi dmon -e 1001,1004 -g 2
i got
Entity GRACT TENSO
GPU 0 0.422 0.876
GPU-I 3 0.984 0.816 # 3g.40gb
GPU-I 2 0.989 0.913 # 2g.20gb
GPU-I 0 0.995 0.965 # 1g.10gb
GPU-I 1 0.995 0.964 # 1g.10gb
According to the documentation ( Feature Overview — NVIDIA DCGM Documentation latest documentation )
the formula above with our actual usage values results in a GPU 0 GRACT value that does not match the value reported by DCGM (0.422). This discrepancy appears when MIG instances are of different slice types, unlike the examples given in the documentation which use identical slices.
Questions:
- What is the correct method to calculate the overall GPU utilization (
GRACT,TENSO) when MIG instances are heterogeneous (e.g., 3g, 2g, 1g)? Please provide a formula or working example for mixed-slice setups. - Why does the
GPU 0GRACT not match the sum of weighted instance utilizations in our case?
Are there any internal weights or normalization factors beyond SM count? - Is
dcgmi dmonthe only officially supported way to monitor per-slice and full GPU utilization?
Do other tools like DCGM APIs, NVML, or NVIDIA NSIGHT offer more accurate or detailed slice-level telemetry?
We would highly appreciate any clarification or documentation references on these topics. This would help us accurately monitor GPU workloads in production Kubernetes environments using MIG