I am debugging a very annoying problem, and I have traced it to the possibility of an index that is supposed to be unique to each thread, not being as unique as I thought.
I need 2^19 threads with their own unique ID, ranging from 0 - 2^19 (524288)
My kernel launch is as follows:
launch<<<2048 ,256, 0>>>(...)
such that 2048*256 = 524288 = 2^19
My unique thread index is as follows:
typedef unsigned __int32 word;
const word x = blockIdx.x * blockDim.x + threadIdx.x;
const word y = blockIdx.y * blockDim.y + threadIdx.y;
const word unique = ((word)x+(word)y);
Does each thread really have a unique value, and does the count really go from 0-2^19??
You are correct, but I don’t want 2D indexes. You raised a valid point, in this case I don’t think the value of y ever changes (it stays 1 all the time). So the unique variable is actually unique I think??
It looks to me like y should always be zero, and the values will be unique in your 1-dimensional case.
To correctly calculate a unique index for the 2-D case, I think this would work:
const word x = blockIdx.x * blockDim.x + threadIdx.x;
const word y = blockIdx.y * blockDim.y + threadIdx.y;
const word unique = x+y*blockDim.x*gridDim.x;