I have one question, when I implement my image processing in CUDA, but have one pixel error compared with CPU result, but When I emulation on CPU (using CUDA EmuDebug) is same as CPU result, so I donâ€™t make sense of why the error makes on GPU, and what generate the difference of CPU and GPU emulation. (I have add the __syncthreads() too).
I once passed the host pointers to kernel code. In emulation mode, the program works fine.
Thanks, my result is different with CPU result just one point, and I use the other data type (float) that is all right, but when I use uchar4 is just one point error :blink:
Well, I don’t know what kind of calculations you’re doing, but if you’re doing floating point math, there will often be small discrepancies between CPU and GPU code due to numerical instability in the algorithms you’re using. This is normal – actually, neither one is perfectly “correct” – so you just need to decide (for your own application) if that error is acceptable or not.
Now, if you’re just doing integer operations…maybe one of your memory addresses is wrong (due to offsets, etc.) for the thread that processes that single pixel?
I do the image convolution operation. the data type is unsigned char, and I use the texture memory, and I think maybe there is not possible the address is wrong, but I’m afraid that maybe the race condition or other error makes. Do you have any other advice? Thanks very much.