Loop inside kernel

I have a base kernel and when I place a loop around that base code inside the kernel, I get different timings every time I run the kernel.
But, I don’t get different timings when running the base kernel (without the loop inside). Is there any issue with loops in cuda? number of iterations are constant.

Thanks.

What is the timer did you use.

I usually use loop inside kernel functions but i didn’t got any problem like your.

I always check the CUDA profiler. I think that that is a most easy and precise way to evaluate your program.

good luck.

The unit of GPUTime and CPUTime in CUDA Profiler is “Microsecond”.

I use cuda timer …
Any more comments about this?