Unique Block Index in 3D Grid

What is the equation for the unique Block Index of a three dimensional grid.

All literature only needs two dimensions, but most of my kernels use a unique block index for accessing data and so they are scalable to the maximum the hardware allows. (theoretically, if video ram would be enough)

Maybe someone already has it?

int bidx = blockIdx.x + blockIdx.y * blockDim.x + blockIdx.z * blockDim.x * blockDim.y

On compute capability 2.0 you have 6 indices you can play with. On less you have 5 indice to play. There are many ways.

For example define the grid(lz,ly,1) and the threads(lx,1,1) and in the kernel you will have

ix=threadIdx,x;

iy=blockIdx.y;

iz=blockIdx.x;

You can also submit like

threads(16,32,1) grid((lx+16)/16,(ly+32)/32,lz)

and in the kernel you will have

ix=blockIdx.x * blockDim.x + threadIdx.x;

iy=blockIdx.y * blockDim.y + threadIdx.y;

iz=blockIdx.z;

I depends a little on you card (because of the max number of threads per block) and the size of the matrix. SO please tell us if you card has compute 2 capability or not and the maximum/ typical size of the matrix you will use,

@MarkusM a unique thread index is dependent on the block dimension so a unique block index must be dependent on the grid dimension

@pasoleatis I was asking for a unique block index of a three dimensional grid!, i dont know what you are doing

Here is the equation for the unique linear index of one block in the three dimensional grid.

const unsigned long long int blockIdx3D = blockIdx.x //1D

	+ blockIdx.y * gridDim.x //2D

	+ gridDim.x * gridDim.y * blockIdx.z; //3D

linearIndex = (blockIdx.z * gridDim.y * gridDim.x * blockDim.y * blockDim.x) +
(blockIdx.y * gridDim.x * blockDim.y * blockDim.x) +
(blockIdx.x * blockDim.y * blockDim.x) +
(threadIdx.z * blockDim.y * blockDim.x) +
(threadIdx.y * blockDim.x) +
threadIdx.x;
this is what i have got in my academic material (CSE)

my friend just showed me:

  1. find block index = blockIdx.y * gridDim.x + blockIdx.x in that grid
  2. index of first thread in that block = blockId * (product of elements of blockDim to get no.of threads per block)
  3. find thread index in that block(just like step 1 but for thread) = threadIdx.y * blockDim.x + threadIdx.x
  4. add results of 3, 4 to get absolute threadId

As not all threads can be active concurrently anyway, you can also put loops inside your kernels or around your kernel calls.

Also the/a uníque thread index can be used for indexing memory, but there are more than one way to create a unique index out of multidimensional numbers.

It makes sense to consider memory access patterns - coalescing, L1/L2 cache hit rate, shared memory bank conflict-free-ness - to create one (or several) unique id(s) optimized for your kernel.

int unique1 = blockIdx.x * blockDim.x + threadIdx.x;
int unique2 = threadIdx.x * gridDim.x + blockIdx.x;

Both are unique and have the same range of ids.