My code works great in debug. In release, I get anomalous data on some bytes of the output. I have a separately verified CPU-based algorithm to test against, and in Debug builds the CUDA code matches the GPU code. The Release code does not match, and I’ve tried on a 8600 and an 8800.
Where should I start looking? Or jump straight to a bug report?
floating point precision. When running on device you can get only signe precision (float), while running in emu mode you get whatever your compiler defaults to (usually double).
Host/Device memory issues. Running in emu mode usually won’t trigger errors when you mistakenly use pointer to device memory instead of pointer to host memory and vice versa. This is because everything is executed on host. When running on device this may cause hard-to-find problems. Examine your code closely.
Accessing unallocated memory (overflows) in device code.
Probably there are more common coding mistakes, but that’s enough for now =) If you still believe your code should run on device and it doesn’t please post source here so that we can reproduce your problem.
From gperks question, I don’t think gperks is using emu mode at all. Are you using debug or emudebug vs release or emurelease? I have a similar issue but it is because OpenCV won’t compile in release mode, not CUDA.