MIG Instances Utilization Calculation

Hi

I’ve configured a Kubernetes cluster on a MIG-enabled GPU node and partitioned the GPU into 4 distinct MIG instances, each with a different slice configuration (2g.20gb, 1g.10gb, 3g.40gb, 1g.10gb).

System Details :
Driver Version: 555.42.06
CUDA Version: 12.5
GPU: NVIDIA H100 80GB HBM3

MIG Device Slice Type SMs Memory
GPU-I 3 3g.40gb 60 40 GB
GPU-I 2 2g.20gb 32 20 GB
GPU-I 0 1g.10gb 16 10 GB
GPU-I 1 1g.10gb 16 10 GB

I used the following DCGM command to collect GPU utilization metrics

dcgmi dmon -e 1001,1004 -g 2

i got

Entity GRACT TENSO

GPU 0 0.422 0.876
GPU-I 3 0.984 0.816 # 3g.40gb
GPU-I 2 0.989 0.913 # 2g.20gb
GPU-I 0 0.995 0.965 # 1g.10gb
GPU-I 1 0.995 0.964 # 1g.10gb

According to the documentation ( Feature Overview — NVIDIA DCGM Documentation latest documentation )

the formula above with our actual usage values results in a GPU 0 GRACT value that does not match the value reported by DCGM (0.422). This discrepancy appears when MIG instances are of different slice types, unlike the examples given in the documentation which use identical slices.

Questions:

  1. What is the correct method to calculate the overall GPU utilization (GRACT, TENSO) when MIG instances are heterogeneous (e.g., 3g, 2g, 1g)? Please provide a formula or working example for mixed-slice setups.
  2. Why does the GPU 0 GRACT not match the sum of weighted instance utilizations in our case?
    Are there any internal weights or normalization factors beyond SM count?
  3. Is dcgmi dmon the only officially supported way to monitor per-slice and full GPU utilization?
    Do other tools like DCGM APIs, NVML, or NVIDIA NSIGHT offer more accurate or detailed slice-level telemetry?

We would highly appreciate any clarification or documentation references on these topics. This would help us accurately monitor GPU workloads in production Kubernetes environments using MIG