Calculate GLOBAL thread Id


I am new to CUDA and trying to wrap my head around calculating the ‘global thread id’

What I mean by this is the following:

Say we have a grid of (2,2,1) and a blocks of (16,16,1) this will generate 1024 threads with the kernel invocation.

I am referring the ‘global thread id’ being each unique instance of a thread within the kernel. in this case threads 0 - 1023.

I am having trouble figuring out how to calculate it though. Here is what I’ve tried:

  1. Per CUDA Programming Guide:
int global_index = threadIdx.x + blockDim.x * threadIdx.y

but this seems to be the thread Id for the block, not the kernel.

  1. Per other documentation I have read:
int xindex = threadIdx.x + blockIdx.x * blockDim.x;

int yindex = threadIdx.y + blockIdx.y * blockDim.y;

int global_index = xindex + (gridDim.x * gridDim.y * yindex);

This just doesn’t get close.

It is clear that I am missing something very fundamental here…I just can wrap my head around it.

Any help is much appreciated.

1 Like

Try tackling it with a top-down approach. At the top level, we have

globalThreadNum = blockNumInGrid * threadsPerBlock + threadNumInBlock

For now, assume we have a 2D grid and 2D blocks. Then

threadsPerBlock  = blockDim.x * blockDim.y

threadNumInBlock = threadIdx.x + blockDim.x * threadIdx.y (alternatively: threadIdx.y + blockDim.y * threadIdx.x)

blockNumInGrid   = blockIdx.x  + gridDim.x  * blockIdx.y  (alternatively: blockIdx.y  + gridDim.y  * blockIdx.x)

Analogous for 3D grids and 3D blocks.

Ah! Yes, makes much more sense. Thanks for the help!