Release and Debug modes on CUDA 5.0

Hi everybody and happy new year!

I am working on a particle simulation code. It is based on the discrete element method. I’ve finished with programming in CUDA and tested my program in debug mode. Now, I changed the mode into the release and did the same calculations. However, the same calculation goes wrong in release mode. In the release mode, the kernel can not detect the wall and particles and hence memory accesses go out of range. Execution fails.

Now my main questions are:

  1. is there any differences between computational accuracy of GPU in debug or release mode? I dont have the same problem with CPU since my previous work was discrete element method on CPU.
  2. do I have to consider any special situation when I want to use release mode, i.e. refusing nested ifs or etc.!!!?

I use cuda 5.0 integrated into Visual Studio 2010. The GPU is geforce GTX 660 Ti.

The first thing you would want to check is whether release builds simply exposes existing an problem with the code, such as a race condition:

(1) Make sure the return status of all CUDA API calls and all kernel launches is being checked
(2) Run the code under control of cuda-memcheck with both the bounds checker and the race checker

In terms of numerical differences, the biggest difference between debug and release builds is typically that in release builds there is aggressive merging of floating-point multiply followed by add into FMA (fused multiply add), which in general improves both performance and accuracy, but can occasionally lead to unexpected results. You can inhibit the merging by compiling with -fmad=false.

Thank you very much. Your suggestion worked and I resolved the problem. I also used printf to check intermediate results in release mode which is available on devices with compute capability > 2.0.