However, Wheing I using ncu to profile my program in agx orin 64G. The SM frequency seems changing during the program, what’s the reason.
Do elementwise kernel, it’s 1.29 GHz
while do gemm kernel, it’s 1.08 GHz
Thanks for your reply.
If I understand you correctly, the SM frequency is influenced by the kernel implementation? However, I tested the same gemm kernel on the same device with same docker container before, I get a different SM frequency:
As you see, its performance is about 20% worse.
Almost same cycles leads to different time.
So,
what factors of kernels can affect the SM frequency which made a significant decrease. In another word, how can I write a kernel to get higher SM frequency.
And can I fix the SM frequency to a certain value to eliminate this instability? May be by writing some hardware config?
Almost the same cycle is expected as you are using the same kernel.
But the elapsed time relates to the GPU clocks and available resources so it can be different.