Performance is better about 10% when using nsight visual studio 2015 profiler than when executing the .exe

Hi sirs,

I have a cuda program with opencv 3.2 library, and building on the Visual Studio 2015, running on GTX1080.
Within this program, will run one thousand times to get the average time of a single operations.

But I encountered a strange problem in a great many tests:

  1. When I use “visual studio 2015 nsight performance analysis” to profiling the time of this program, the spent average time of this program is about 650ms.
  2. But when I running the .exe of this program directly, the average time is about 720ms.

That is to say the “.exe” is slow about 10% than using “visual studio 2015 nsight performance analysis”.

From the link https://devtalk.nvidia.com/default/topic/766013/performance-is-much-better-when-profling-with-nsight-than-when-running-production-code/, I checked the environmental variables in my computer, I have not set the below environmental variables:

  • NSIGHT_CUDA_DEBUGGER=1
  • CUDA_INJECTION32_PATH
  • CUDA_INJECTION64_PATH

So I want to know whether “visual studio 2015 nsight performance analysis” will set some flags to GPU to improve the performance, or why happen the above problems?

Because my program will run on the customer’s computer, so I want my “.exe” program could run as fast as using “visual studio 2015 nsight performance analysis”.

Thank you very much!

Could somebody help me?
This problem have already troubled me two months :( :(

have you monitored the GPU clock rates with tools like GPU-Z while running with or without the profiler?

Maybe in the profiling case the GPU runs in a higher power state and clock rate permanently.

Christian

More likely the same power state (P0), but different boost clocks. I haven’t use a GTX 1080 but some GPUs offer a very wide array of different boost clocks, and temperature effects alone can make the clocks differ by more than 10%. Very irritating if one wants to benchmark code. In the best case the GPU supports application clocks, that allowed one to pick a fixed clock, but even that may not be true anymore these days.

Without sitting in front of the system and running the actual code, we are limited to wild speculations, I am afraid/.

Dear Christian,

Thank you very much for the comments.
I have used GPU-Z tool every time when I measure the performance, there are two clock frequencies for GTX1080, 1.6G and 1.9G. From my observations, GTX1080 could switch the clock frequency automatically according to the current loading or conditions.
But from the GPU-Z, when the GPU frequency is same:

  1. In most cases, the executing time of “.exe” is slower about 10% than using nsight visual studio performance profiler.
  2. A few cases, the executing time of “.exe” is about the same with nsight vs profiler.
    Sometimes when executing the “.exe” program is very slow , I open the nsight profiler immediately, the performance improvement of nsight profiler is very evident, but from the GPU-Z, the gpu clock rate is same normally.

I got the above conclusions from the observations of about two months, but I cannot find the causes.
Because my program will run for a long time on customer’s computer, so I want my “.exe” program could run as fast as using “visual studio 2015 nsight performance analysis”, this will save a lot of time for the users.

Hi,
I’m having the same kind of problem, albeit it’s not 10% but much more, and without nsight the framerate is cycling between low & high values.

I’m trying to find the source of issue for months.

My framerate over time when running in standalone, mean value is 16 fps
http://hpics.li/bb97329
And when running with nsight, mean value is 40fps and framerate is stable.
http://hpics.li/5c4cd9a

GPU is Geforce 560ti and drivers are 385.41
I also checked with GPU-Z and gpu is running at the same clock in both cases.
The renderer uses OpenGl 4

Thanks,
Jean-Baptiste.