Debug version is 10x faster than Release version?

I am new to CUDA, so please bear with me. I played around with some example projects from CUDA SDK, and I modified the template project to do simple 2D memory allocation and copy. I attach the code with this post. The puzzling thing is, cutil’s timer shows that the code runs faster when it is compiled on Debug mode!

On my Xeon 2.6GHz CPU with Quadro FX 4600, the attached code runs in less than 1ms in debug mode, but in release mode, it runs in >10ms! I have run the code multiple times on both configurations. Could anyone tell me why is this so? Thanks.
Test.zip (35.7 KB)

  1. You are only timing one iteration. To get accurate timing results you should perform many (hundreds to thousands) of measurements and average.

  2. You are timing memory allocations. I don’t know of any specific reason why they would be different in release and debug modes, but are you truly trying to benchmark allocations? Memory allocations are likely to take widely different times to complete based on the currently allocated memory pool. And memory allocations are slow, you wouldn’t want to do them in the inner loop of your program.

  3. By debug mode do you mean “Debug” or “EmuDebug”?? That could make the difference.

You’re right MrAnderson! After I perform thousands of iterations and average the timing, I get saner result. Previously, I observed that whenever I switch to Release mode and run the program (once), the timing always increased, and it decreased after I switch to Debug mode, hence my erroneous conclusion. Thanks.