I’ve been profiling a kernel on an A100 GPU. NSight reports the SM Frequency of 1.09 cycles/nsecond which appears to correspond to the nvidia-smi output below. However, when I run same command while the application is running without NCU, it list 1410 MHz for the Graphics and SM clocks, which is the boost clock I expect. Is there a reason why the clock is lower when profiling vs an actual run? This results in incorrect rooftop profiles in Nsight Compute do the artificially low clock-speed.
Thanks
Gaetan
nvidia-smi -q -d CLOCK -i 0
==============NVSMI LOG==============
Timestamp : Wed Mar 16 16:06:13 2022
Driver Version : 470.57.02
CUDA Version : 11.4
Attached GPUs : 8
GPU 00000000:07:00.0
Clocks
Graphics : 1095 MHz
SM : 1095 MHz
Memory : 1215 MHz
Video : 585 MHz
Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Default Applications Clocks
Graphics : 1095 MHz
Memory : 1215 MHz
Max Clocks
Graphics : 1410 MHz
SM : 1410 MHz
Memory : 1215 MHz
Video : 1290 MHz
Max Customer Boost Clocks
Graphics : 1410 MHz
SM Clock Samples
Duration : Not Found
Number of Samples : Not Found
Max : Not Found
Min : Not Found
Avg : Not Found
Memory Clock Samples
Duration : Not Found
Number of Samples : Not Found
Max : Not Found
Min : Not Found
Avg : Not Found
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A