different values using emuDebug and Debug

Hi, I am calculating the sum of all the elements in a row from shared memory and storing the updated value in the device memory. My code is as follows:

Sum_Row[0] += AS(ty, tx);

Sum_Row[1] += BS(ty, tx);


“AS”, “BS” are on shared memory

“Sum_Row[0]”, “Sum_Row[1]” are device memory arrays

By copying back “Sum_Row” to host memory I verified the results and found that my program in EmuDebug mode give correct results but not in Debug mode. I also noticed that the value from Debug mode is the sum of last two elements of the last row. This computation is performed on a (N,1) grid.

Is this call for adding the elements is not valid in GPU programming. If no, can you please post correct procedure else please suggest.

Appreciate your response

Your code has a concurrent execution bug: The += reads the last written value. Given that all threads execute this concurrently, the result is undefined.

In emudebug, the code runs on the CPU and therefore runs serial so this bug does not show up if the thread scheduling happens to be in ascending tx/ty order.