CUDA thread allocation

If I am finding the global thread index like this:

int i = threadIdx.x;

Can someone explain how I would find the thread index for addition of 2 vectors with each of these thread structures?

1)One 2D block
2)One 3D block
3)multiple 2D block in 1D block grid
4)multiple 2D block in 2D block grid
5)multiple 2D block in 3D block grid
6)multiple 3D block in 1D block grid

Thank you im just struggling to understand this.

You might find the first two sections of this helpful: Programming Guide :: CUDA Toolkit Documentation

Thanks for the link, I know how to do it for a 2D and 3D structure, but how would it differ if i had multiple 2D or 3D thread blocks?
Especially:

5)multiple 2D block in 3D block grid

Thank you

I’m afraid this stuff tends to do my head in and as I’m quite new to Cuda, still have yet to venture beyond 1D :-)

The general approach is:

  1. Determine the global block Id of the current block
  2. Multiply it with the number of threads per block → that’s the global thread index of the first thread in the current block
  3. Add the id of the current thread within its block