I’m having a few problems getting the correct output when running code on the gpu. The output is correct in emulation mode however. This leads me to believe that the problem is due to data dependencies although, these dependencies don’t appear to be obvious.
My code contains calculations in an array that is only used in the device code and is initialised in device code. My intention is for each thread to have it’s own copy of this array. This is a possible source of the incorrect output if different threads are adding the result to the same array in global memory. Is there a way of ensuring the array is local to each thread?