The NVIDIA GPU has logic in many different clock domains.
<unit>__cycles_elapsed
is the number of clock cycles between the start counting and stop counting triggers in the 's clock domain.
GPC is the Graphics Processing Cluster. GPCs are one of the building blocks of the NVIDIA GPU. Small GPUs may have only 1 GPC. Larger GPUs have 2, 4, 6, … GPCs. A GPC contains a set of SMs, L1 caches, Texture caches, and graphics pipe.
DRAM is the name for the GPU memory controllers for GDDR or HBM memory.
The GPC clock is the primary clock advertised for a NVIDIA GPU. This may be called the graphics clock, application clock, base clock or boost clock.
The DRAM clock is the base clock for the GDDR or HBM memory.
For performance tuning gpc__cycles_elapsed.avg
or .max are the primary measurement. The elapsed clocks in each GPC can vary slightly so NCU uses gpc__cycles_elapsed.max
.
.per_second
converts <units>__cycles_elapsed
into a clock frequency.
The __elapsed_cycles are used for throughput calculations such as sm__cycles_active.avg.pct_of_peak_sustained_elapsed
sm__cycles_active.avg.pct_of_peak_sustained_elapsed = sm__cycles_active.avg / sm__cycles_elapsed.avg.
This is the percentage of cycles that the SM was active during the counter collection period.