CUDA on release and debug mode (faster or not)

Hi CUDA community.
My question is the following. I turn in CUDA some of my code and the difference is the following. CPU code runs in 6 minutes and in CUDA in 1:30 min. When I turn in release mode the CUDA code is not getting faster and they both run in the same time then. So my question is the following, does the CUDA code run the same time in debug and release mode?

I have noticed significant speed in Release build.
i.e. Debug kernel execution ~600ms vs Release kernel execution ~0.083ms


Find an explanation for this behavior here:

I have used cudaThreadSynchronize() after kernel call, if you implied to that. Time was measured with timers.

More about this question can be found here