Profile cuda kernel


I try to find out if modifications I did on a kernel gives me better or worst performance. I run the kernel several hounds times and use the average time it needs to compare the different kernel. I guess that these timings depends on the core and memory clock values. Since the core clock drops if the temp of the gpu increases I get better results when I start with a cool gpu than I will get with a hot one. So this way to compare kernel performance is not very objective.

Is there a better way to find out which kernel performs better independent from core/mem clock and gpu temps?


Have you had a chance to read through NVIDIA’s recommendations regard this topic?

Hi - great. Thanks.

Not all of the nvidia-smi commands mentioned in the document, are supported on some Geforce cards and although the document only mentions the “Lock clocks to base” setting in Nsight Graphics, there is a similar setting in Nsight Compute.

Hi- Can you point me to the location of this lock setting in nsight compute? Couldn’t find it.

see here

That entire section on “reproducibility” may be of interest. Also see here

Questions about nsight compute may get better support on the nsight compute forum.

It’s the last setting in this window here - “Clock Control”

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.