I am new to CUDA, so please bear with me. I played around with some example projects from CUDA SDK, and I modified the template project to do simple 2D memory allocation and copy. I attach the code with this post. The puzzling thing is, cutil’s timer shows that the code runs faster when it is compiled on Debug mode!
On my Xeon 2.6GHz CPU with Quadro FX 4600, the attached code runs in less than 1ms in debug mode, but in release mode, it runs in >10ms! I have run the code multiple times on both configurations. Could anyone tell me why is this so? Thanks.
Test.zip (35.7 KB)