The GPU is a collection of independent engines that include:
GR - graphics engine that manages the 3D, 2D, and compute pipeline. In the case of GA100 multi-instance GPU there would be multiple GR engines; in MIG mode the 3D pipeline is disabled.
CE - asynchronous copy engines
Display - display engine
NVDEC - video decoder engine
NVENC - video encoder engine
gpu__ is used for a metric when the metric is
independent of an individual engine
includes derived metrics that have an engine unit and a shared resources (e.g. gpu__compute_memory_access_throughput includes metrics from both SM, L1TEX, and LTS).
gr__ is used when the metric is specific to the graphics engine (3D pipe, compute pipe, of 2D pipe).
For metrics such as {unit}_cycles{active, elapsed, in_frame, in_region} the value for gr and gpu will likely be the same as the performance monitor used for these signals is in the same clock domain.