Results differences between emulation, and normal Emulation vs Normal, result differences


I would like to get some advices from experienced CUDA programmers :)

Well, I parallelized the portion of the application, and it exactly produces the same outcome in the emulation mode. However, when I run the portion of the code on the GPU, then some result differences occur that lead to the undesired crash of the whole application…

I would like to get some advices on… what could be the potential Big areas that might lead to the different results ? is it synchronization ? is it the number of threads ? … and so on…

Thank you!

It is almost impossible to tell anything from such poor description of problem… You should at least clarify what you mean by “some result differences”.

Page number 47/125 - CUDA Programming guide 1.0 – might help you here.