I have a kernel the performs a gaussian blur on an image. My problem is that the result that I get in EmuDebug (or EmuRelease) is different when running on the GPU (either Debug or Release modes).
To be more precise the result is the expected on Emulation. When running on GPU the result is again the expected but with a few added noise. Noise is some small gray (and quite transparent) vertical lines. Added noise is not so annoying but the image is not perfect! I don’t make any change to the kernel.
I can upload some examples, if it’s easier for you to understand what I mean.
Image data is stored in global memory and I’m pretty sure that there is no problem with image boundaries.
Finally I think, it’s because to multiple access to same memory address by different threads. Because of the algorithm design, I use overlapping areas, so each thread accumulates a value to different pixels.
and there is no synchronised access to global memory for CUDA, right?