I have a certain piece of CUDA code and an ‘equivalent’ C host code (I mean code meant to do the same task).
Let me define –
Device emu result: output obtained when the CUDA code is compiled and run in the device emulation mode,
CPU result: output obtained from the equivalent C code,
GPU result: output obtained by compiling and running CUDA code on GPU.
Now I find that whereas the device emu result and the GPU result match, the device emu and the CPU results do not match perfectly. My understanding is that in the device emu mode the computations are performed on the CPU. How should I understand this difference then? Is the only conclusion to be drawn that there is a problem in my code?
Thanks in advance!!