I want to understand how the SM frequency is calculated in ncu.
What I found is that there is a mismatch between the SM frequency calculated in ncu and the one reported by nvidia-smi dmon.
I am aware of the clock-control feature in ncu, I disable the clock control in ncu while maximizing the clock using nvidia-smi -ac/lgc/pm as pointed out in some other post here.
Please let me know why the SM frequency in ncu never reaches the boost clock that I gave in nvidia-smi, and which reported frequency is more accurate? ncu or nvidia-smi dmon?
The SM frequency reported by ncu is the average of the frequency over the entire profile. It isn’t an instantaneous or peak measurement. It’s not guaranteed that the GPU can operate at any boosted clock frequency for the duration of the kernel. To interpret ncu results, it’s most accurate to use the SM frequency it is reporting.
Yes, but when I profile a kernel in ncu while monitoring the frequency using nvidia-smi, the reported SM freq is different from the ones reported by nvidia-smi during the lifetime of the kernel profiling process. I am aware the SM frequency reported in ncu is the avg, it’s different from the average reported by nvidia-smi, I am struggling to understand why.
Do not specify clock control and run a long kernel. While the kernel is running query nvidia-smi to see if the clocks match.
Run your application in Nsight Systems with GPU Metrics enabled at 100kHz. This will graph the GPU clock (application/graphics clock in nvidia-smi or GPC (gpc__cycles_elapsed.max.per_second in NCU). You can graph this at a high rate to see if it varies. Nsight Systems does not lock the clocks to base by default. Compare this value to the value reported by nvidia-smi.
nvidia-smi is reporting the value the GPU is requested to be in. This can vary. NCU and NSYS are measuring the number of clock cycles accumulated between two triggers. NCU and NSYS are going to report an accurate average frequency during the capture.