CUDA emulation release Performance when running in emulation


When the program is build in EmuRelease configuration, what is the targeted device? When I call cutGetTimerValue function to get the execution time, is it the execution time on the targeted device or the host processor? I tried to run the same program several times, but everytime it gives me different results, and seems the result also depends on the host processor’s loading.


Everything runs on the CPU in the emulation mode. I wouldn’t worry about the times in emulation, as the mode is emulating execution by threads, blocks, etc., all sequentially.


And also cutGetTimerValue() only returns the time passed between cutStartTimer and cutStopTimer calls. It doesn’t care what you did in between so you can’t say it is the execution time on the target or the host device.

Is there any way to get execution time (or the number of cycles) a segment of program will execute on real GPU? I don’t have a CUDA compatible graphics card at hand.

No, it is an emulator not a cycle accurate simulator.


Can you provide me some information regarding “cutGetTimerValue function”
and sample code to apply it?

As I want to estimate time taken by “cublas and cufft APIs”.
But don’t knw how to this?

Thanks in advance :)

You can read the source code for cutGetTimerValue in the CUDA SDK yourself. Or look at any of the sdk projects for an example.