Unstable performance measured by cuda event

I repeatedly measured time cost of a batch of short kernels by cuda event.
The output is very unstable as follows:

Time cost:8.61389 ms.
Time cost:6.08563 ms.
Time cost:6.55667 ms.
Time cost:6.07232 ms.
Time cost:7.49773 ms.
Time cost:6.45325 ms.
Time cost:6.08666 ms.
Time cost:6.06003 ms.
Time cost:6.09587 ms.
Time cost:8.32205 ms.
Time cost:6.47475 ms.
Time cost:6.08666 ms.
Time cost:6.05491 ms.
Time cost:7.37382 ms.

But when I use nsight system UI to profile the same program, the output is much more stable:

Time cost:6.12899 ms.
Time cost:6.11926 ms.
Time cost:6.11315 ms.
Time cost:6.10931 ms.
Time cost:6.11888 ms.
Time cost:6.11216 ms.
Time cost:6.11264 ms.
Time cost:6.11046 ms.
Time cost:6.12355 ms.
Time cost:6.10992 ms.
Time cost:6.1209 ms.

What have nsight system UI done before run profiled program so the performance is very stable?
Or what should I do before measure performance of kernel by cuda event?

I tried to run nvidia-smi -lgc XX -lmc XX, but it didn’t help.

You do not say what card you are using, but locking graphics/memory clocks via nvidia-smi is only available on Tesla and Quadro, (I believe) cards.

However GTX/RTX cards can have their clocks locked by Nsight Compute. Possibly the same occurs with Nsight Systems, I have little experience there and this might explain what you are seeing.

I test on 3080. The OS is ubuntu 18.04 LTS. Cuda toolkit is 11.7.
I checked the clock speed of memory and graphics output by nvidia-smi -q -d clock, both is locked to the specific speed. So I don’t think only the clock speed caused what I saw.

Besides. I also run Nsight System CLI to profile my program. The measured performance also varied a lot.
Only the Nsight System UI shows stable performance.

So the Nsight System UI must have done something before profile.

The question is what it is?

This seems like an excellent question for the Nsight Systems forum: https://forums.developer.nvidia.com/c/development-tools/nsight-systems , or the Nsight Compute forum: https://forums.developer.nvidia.com/c/development-tools/nsight-compute/