I have a cuda program with opencv 3.2 library, and building on the Visual Studio 2015, running on GTX1080.
Within this program, will run one thousand times to get the average time of a single operations.
But I encountered a strange problem in a great many tests:
When I use “visual studio 2015 nsight performance analysis” to profiling the time of this program, the spent average time of this program is about 650ms.
But when I running the .exe of this program directly, the average time is about 720ms.
That is to say the “.exe” is slow about 10% than using “visual studio 2015 nsight performance analysis”.
So I want to know whether “visual studio 2015 nsight performance analysis” will set some flags to GPU to improve the performance, or why happen the above problems?
Because my program will run on the customer’s computer, so I want my “.exe” program could run as fast as using “visual studio 2015 nsight performance analysis”.
More likely the same power state (P0), but different boost clocks. I haven’t use a GTX 1080 but some GPUs offer a very wide array of different boost clocks, and temperature effects alone can make the clocks differ by more than 10%. Very irritating if one wants to benchmark code. In the best case the GPU supports application clocks, that allowed one to pick a fixed clock, but even that may not be true anymore these days.
Without sitting in front of the system and running the actual code, we are limited to wild speculations, I am afraid/.
Thank you very much for the comments.
I have used GPU-Z tool every time when I measure the performance, there are two clock frequencies for GTX1080, 1.6G and 1.9G. From my observations, GTX1080 could switch the clock frequency automatically according to the current loading or conditions.
But from the GPU-Z, when the GPU frequency is same:
In most cases, the executing time of “.exe” is slower about 10% than using nsight visual studio performance profiler.
A few cases, the executing time of “.exe” is about the same with nsight vs profiler.
Sometimes when executing the “.exe” program is very slow , I open the nsight profiler immediately, the performance improvement of nsight profiler is very evident, but from the GPU-Z, the gpu clock rate is same normally.
I got the above conclusions from the observations of about two months, but I cannot find the causes.
Because my program will run for a long time on customer’s computer, so I want my “.exe” program could run as fast as using “visual studio 2015 nsight performance analysis”, this will save a lot of time for the users.