I am new to CUDA and trying to wrap my head around calculating the ‘global thread id’
What I mean by this is the following:
Say we have a grid of (2,2,1) and a blocks of (16,16,1) this will generate 1024 threads with the kernel invocation.
I am referring the ‘global thread id’ being each unique instance of a thread within the kernel. in this case threads 0 - 1023.
I am having trouble figuring out how to calculate it though. Here is what I’ve tried:
- Per CUDA Programming Guide:
int global_index = threadIdx.x + blockDim.x * threadIdx.y
but this seems to be the thread Id for the block, not the kernel.
- Per other documentation I have read:
int xindex = threadIdx.x + blockIdx.x * blockDim.x; int yindex = threadIdx.y + blockIdx.y * blockDim.y; int global_index = xindex + (gridDim.x * gridDim.y * yindex);
This just doesn’t get close.
It is clear that I am missing something very fundamental here…I just can wrap my head around it.
Any help is much appreciated.