 I’m sure this has been discussed before, but I’m finding the search function of this forum to be somewhat lacking (and very frustrating).

I’ve reached a point where my kernel needs more than 65535 blocks, so I need to go to a 2-dimensional grid for the first time. The example in the programming guide is redundant due to the 16x16 grid size, and I’m not really interested in the “coordinates” of the block anyway. I’m really only interested in numbering the blocks sequentially, so that each block has a unique number. So, to obtain this unique number, I could use a formula such as:

``````blockNumber = (blockIdx.y * gridDim.x) + blockIdx.x;
``````

right?

I just wanted to double check that I am interpreting things (particularly gridDim) in the correct manner.

Thanks.

Looks okay to me!

Yes. It is ok.

In general you can have:

1. one dimensional array of blocks on grid where each block has one dimensional array of threads then:

UniqueBlockIndex = blockIdx.x;

1. one dimensional array of blocks on grid where each block has two dimensional array of threads then:

UniqueBlockIndex = blockIdx.x;

1. one dimensional array of blocks on grid where each block has three dimensional array of threads then:

UniqueBockIndex = blockIdx.x;

1. two dimensional array of blocks on grid where each block has one dimensional array of threads then:

UniqueBlockIndex = blockIdx.y * gridDim.x + blockIdx.x;

1. two dimensional array of blocks on grid where each block has two dimensional array of threads then:

UniqueBlockIndex = blockIdx.y * gridDim.x + blockIdx.x;

1. two dimensional array of blocks on grid where each block has three dimensional array of threads then:

UniqueBlockIndex = blockIdx.y * gridDim.x + blockIdx.x;

Thre dimensional grid of blocks hasn’t been supported on CUDA (yet) so you have only those 6 combinations.

Offcourse you can calculate LocalThreadIndex (per block) for a three dimensional array of threads ie.

Thanks a lot for the help. I think monday was adversely affecting my brain.

Hello,

Your 6th point seems a liitle strange to me.

Although 3 dimensional grid are not supported you use : blockDim.z

I understand that

``````threadIdx.z * blockDim.y * blockDim.z + threadIdx.y * blockDim.x + threadIdx.x;
``````

should have been misstape and is in fact :

``````threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;
``````

and I think you mean :

``````UniqueBlockIndex  * gridDim.z * gridDim.y * gridDim.x
``````

``````UniqueBlockIndex  * blockDim.z * blockDim.y * blockDim.x
``````

As a beginner I am getting confused with all that. Anyone can check what I just said ?

Thanks a lot.

I would say you are right that it should have been

``````... threadIdx.z * blockDim.y * blockDim.x + threadIdx.y * blockDim.x + threadIdx.x;
``````

But I disagree about the other part. I think that this portion is correct:

``````UniqueBlockIndex * blockDim.z * blockDim.y * blockDim.x ...
``````

Although it is equivalent to the simpler version that omits the z dimension. But conceptually the blockDim.z * blockDim.y * blockDim.x is the number of threads per block. So the total is the UniqueBlockIndex * (number of threads per block) + (thread number within this block).

Thank you !
It’s much more clear now.

thanks a lot mandrak