Hello,
I would like to get some advices from experienced CUDA programmers :)
Well, I parallelized the portion of the application, and it exactly produces the same outcome in the emulation mode. However, when I run the portion of the code on the GPU, then some result differences occur that lead to the undesired crash of the whole application…
I would like to get some advices on… what could be the potential Big areas that might lead to the different results ? is it synchronization ? is it the number of threads ? … and so on…
Thank you!