I am debugging a very annoying problem, and I have traced it to the possibility of an index that is supposed to be unique to each thread, not being as unique as I thought.
I need 2^19 threads with their own unique ID, ranging from 0 - 2^19 (524288)
My kernel launch is as follows:
launch<<<2048 ,256, 0>>>(...)
such that 2048*256 = 524288 = 2^19
My unique thread index is as follows:
typedef unsigned __int32 word; const word x = blockIdx.x * blockDim.x + threadIdx.x; const word y = blockIdx.y * blockDim.y + threadIdx.y; const word unique = ((word)x+(word)y);
Does each thread really have a unique value, and does the count really go from 0-2^19??