Different kernel behaviour EmuDebug VS Debug

Hi,

I have a kernel the performs a gaussian blur on an image. My problem is that the result that I get in EmuDebug (or EmuRelease) is different when running on the GPU (either Debug or Release modes).
To be more precise the result is the expected on Emulation. When running on GPU the result is again the expected but with a few added noise. Noise is some small gray (and quite transparent) vertical lines. Added noise is not so annoying but the image is not perfect! I don’t make any change to the kernel.
I can upload some examples, if it’s easier for you to understand what I mean.

Image data is stored in global memory and I’m pretty sure that there is no problem with image boundaries.

I have 8800 GT under Server 2008.

Did you use separable gaussian from an example? I have the same pb…

No.

I created the algorithm. Each thread calculates a small area of the image. As simple as that!

debug thing. make sure you can recite each pixel’s data flow with an 2x2 image, then 4x4, then problems may emerge.

Finally I think, it’s because to multiple access to same memory address by different threads. Because of the algorithm design, I use overlapping areas, so each thread accumulates a value to different pixels.

and there is no synchronised access to global memory for CUDA, right?

How was this solved?I’m having the same problem