Is Cycles dependent on Time or reverse?

nv-nsight-cu-cli -f -o sampe --target-processes all --set full ./a.out
nv-nsight-cu-cli -f -o sampe --target-processes all --clock-control none --set full ./a.out

I use these cmds to profiling kernel (How to reach HBM Peak bandwidth performance), which can lead to different SM frequency. First cmd: times: 676.49us, cycles is 741794, SM Frequency: 1.10 cycle / nsecond; Second cmd: 678.18us, cycles is 956869, SM frequancy: 1.41 cycle / nsecond; And this kernel is simple memory copy kernel and hasn’t flops.

Profiling results show that these two profiling time is similar, but cycles has huge gap. In nsight compute, cycles value is computed from times and sm frequency? or times is computed from cycles and sm frequency? And for memory bound kernel, should DRAM frequency be replace SM frequency?


gpu__time_duration.sum and sm__cycles_elapsed.max are separate counters. gpu__time_duration is calculated by collecting timestamps at the start and end of performance counter collection. The timestamps have 32ns accuracy. sm__cycles_elapsed.max is the highest number of cycles in the collection period.

--clock-control none does not set the GPU clock rate which can result in the GPU changing clock rates including increasing to the boost block rate.

--clock-control base sets the GPU to the base clock. The base clock is lower clock rate than the boost clock that you are seeing with clock-control none.