Gpu__cycles_active vs. sm__cycles_active.max

Hi there,

I got the following reading using NCU on NVIDIA A100 GPU:

sm__cycles_active.avg 500,164,181.17
sm__cycles_active.max 502,440,554.00
sm__cycles_active.min 498,137,234.00
sm__cycles_active.sum 54,017,731,566.00

gpu__cycles_active.avg 488,910,637.00
gpu__cycles_active.max 488,910,637.00
gpu__cycles_active.min 488,910,637.00
gpu__cycles_active.sum 488,910,637.00

Shouldn’t gpu__cycles_active >= sm__cycles_active.max since SM active means the GPU is active? Or is my understanding incorrect?

Thanks.

NVIDIA GPUs have numerous clock domains. Perfworks metrics are prefixed with a unit <unit>__. Metrics with the same unit prefix are in the same clock domain and should obey the rule <unit>__cycles_active.* .max > .avg > .min.

The gpu__ metrics are not in the same clock domain as the sm__ metrics. gpc__, tpc__, sm__, smsp__, l1tex__, and gcc__ are all in the same clock domain. This means the __elapsed_cycles.avg will be the same or approximately the same (if collected in multiple passes there may be a difference).

In most cases the % of elapsed cycles the gpu is active will be >= % of elapsed cycles the SM will be active as the gpu__cycles_active is active if any SM is active.
gpu__cycles_active.avg.pct_of_peak_sustained_elapsed >= sm__cycles_active.max.pct_of_peak_sustained_elapsed.

Thank you Greg! This is really helpful for me :)