Why SM frequency is changing when using ncu do profiling

In Jetpack 6.0, I set my orin to performance mode, with maximum fan’s frequency


However, Wheing I using ncu to profile my program in agx orin 64G. The SM frequency seems changing during the program, what’s the reason.
Do elementwise kernel, it’s 1.29 GHz
image
while do gemm kernel, it’s 1.08 GHz
image

Hi,

The SM Frequency is the profiling output rather than the max clock rate.

Depends on the kernel, the SM Frequency will be different.
It is calculated by Time/Cycles directly.

For example:
14,003,219 / 12.90 msecond ~ 1.08 cycle/nsecond.

Thanks.

Thanks for your reply.
If I understand you correctly, the SM frequency is influenced by the kernel implementation? However, I tested the same gemm kernel on the same device with same docker container before, I get a different SM frequency:
image
As you see, its performance is about 20% worse.
Almost same cycles leads to different time.

So,

  1. what factors of kernels can affect the SM frequency which made a significant decrease. In another word, how can I write a kernel to get higher SM frequency.
  2. And can I fix the SM frequency to a certain value to eliminate this instability? May be by writing some hardware config?

Hi,

Almost the same cycle is expected as you are using the same kernel.
But the elapsed time relates to the GPU clocks and available resources so it can be different.

  1. You can check our CUDA document for the guidance to accelerate a CUDA kernel.
    CUDA C++ Best Practices Guide

  2. If you run the same kernel under the same GPU clocks, the value should be similar.

Thanks.