Visual C++ CUDA program on GPU Runs perfectly in emulation

I have a C++ program in visual 2005.
In this program I use the CUDA build rule I found on these forums.
When I select DEBUG as my active configuration, the program runs very slowly (0.5 FPS), but at the end I can see the calculations I did with the CUDA kernel is correct.
When I run the program on Release, it runs much faster, but I do not get any result.
As if the kernel didn’t run at all.
I also use CUT_CHECK_ERROR(“Kernel execution failed”); but it seems it has no effect whether my kernel runs or not.
I have a CUDA compatible NVIDIA graphics card, and I ran some CUDA sample from NVIDIA on Release configuration.
I use these grid data for the kernel:
dim3 grid(32, 24);
dim3 threads(20, 20);

Is there something I am doing wrong?
Why does it work on Debug and not Release?

Thank you.

CUT_CHECK_ERROR is only enabled in debug modes, not release, this is why you don’t see any errors reported in release mode.

I don’t know how the build rule works, but usually, “Debug” mode still runs on the GPU, so your kernels should be fine. It is likely another debug/release bug issue causing your problem (like using uninitialized memory or some such.)