Odd behavior with computeprof

Hi everyone,
I’m using versions 3.2 of both the driver and computeprof on a linux system.

When profiling with computeprof, my program takes about 4.2 seconds to run and I get similar timings when launching from the terminal while the computeprof is still running.
As soon as I close the computeprof window and launch my program from the terminal, the timing goes up to about 7 seconds.
I could easily understand if using the computeprof induced a little overhead, but it seems that the computeprof actually makes my program much faster.

Does anyone have an idea why this is happening ?




i have the same problem, did someone solved it?


I just guess this is dye to the card/context being pre-initialised by the cuda profiler. Depending on the platform, creating the context might take up to more than 3 second (at least I measured it up to this limit myself). So my take is that you simply see this timing difference when you have to make the initialisation within your code. If your card is a tesla brand (not sure if quadro supports it as well), I encourage you to set the persistent mode bit with nvidia-smi. This should hide most if not all of this time.

I’m using a GTX 580 card, but i have two tesla too.

OK, problem solved, you’re right

Thank you!