NVIDIA GPUs have numerous clock domains. Perfworks metrics are prefixed with a unit <unit>__. Metrics with the same unit prefix are in the same clock domain and should obey the rule <unit>__cycles_active.* .max > .avg > .min.
The gpu__ metrics are not in the same clock domain as the sm__ metrics. gpc__, tpc__, sm__, smsp__, l1tex__, and gcc__ are all in the same clock domain. This means the __elapsed_cycles.avg will be the same or approximately the same (if collected in multiple passes there may be a difference).
In most cases the % of elapsed cycles the gpu is active will be >= % of elapsed cycles the SM will be active as the gpu__cycles_active is active if any SM is active. gpu__cycles_active.avg.pct_of_peak_sustained_elapsed >= sm__cycles_active.max.pct_of_peak_sustained_elapsed.